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Operator weak values have emerged, within the so-called Two- Vector Formulation of 
Quantum Mechanics, as a way of characterizing the physical properties of a quantum 
system in the time interval between two ideal complete measurements. Such weak 
values can be defined operationally in terms of the weak measurement scheme, a 
non-ideal variation of the standard von-Neumann scheme in which the disturbance 
of the system is minimized at the expense of statistical significance on a single trial. 
So far, however, no connection has been established between weak values and the 
results of measurements that fall in the intermediate strength regime between ideal 
and weak measurements. In this dissertation, a model is proposed for the statistical 
analysis of such measurements, based on a picture of "sampling weak values" from 
different configurations of the system. The model is comprised of two elements: a 
"local weak value" and a "likelihood factor". The first describes the response of 
an idealized weak measurement situation where the back-reaction on the system is 
perfectly controlled. The second assigns a weight factor to possible configurations 
of the system, which in the two vector formulation correspond to ordered pairs of 
wave functions. The distribution of the data in a measurement of arbitrary strength 
may the be viewed as the net result of interfering different samples weighted by the 
likelihood factor, each of which implements a weak measurement of a different local 
weak value. It is shown that the mean and variance of the data can be connected 
directly to the means and variances of the sampled weak values. The model is then 
applied to a situation similar to a phase transition, where the distribution of the 
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data exhibits two qualitatively different shapes as the strength parameter is slightly 
varied away from a critical value: one below the critical point, where an unusual 
weak value is resolved, the other above the critical point, where the spectrum of 
the measured observable is resolved. In the picture of sampling, the transition 
corresponds to a qualitative change in the sampling profile brought about by the 
competition between the prior sampling distribution and the likelihood factor. 
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Chapter 1 

Introduction 



In this dissertation we propose an alternative model for the statistical analysis of 
measurements in quantum mechanics, which is based on a non-standard interpreta- 
tion of the theory known as the two vector formulation of Quantum Mechanics. The 
picture that we wish to associate with this model is that the underlying "signal" in 
a measurement of some observable A are not the eigenvalues a, but rather a totally 
different property attached to the measured system known as the "weak value of 
A" . We refer to this as the picture of "sampling weak values" . 

In order to get a clearer understanding of the statement of the problem, we 
shall first review the underlying motivation for the two vector formulation and the 
operational definition of weak values. 

1.1 Two Vector Formulation and Weak Values 

As is well-known, standard quantum mechanics is grounded operationally in terms 
of ideal measurements, that is, measurements yielding a precise eigenvalue of some 
observable A. Such measurements consist of an interaction between the microscopic 
system and some macroscopic reference object-the so-called apparatus. This ideal 
measurement process plays a two-fold role in the mathematical formulation of the 
theory: 

1. On the one hand, the distinguishable effect on the apparatus, i.e., the measured 
eigenvalue a, provides a selection criterion on the system. This establishes at 
the macroscopic level the correspondence between statistical ensembles and 
the basic mathematical object of the theory: the quantum state \ip) attached 
to the system. The state encodes the maximal available information for the 
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purpose of prediction, in other words, the outcome probabilities for all possible 
future similarly ideal measurements that may be performed on the system. 

2. On the other hand, the apparatus also serves the role of a mechanical reference 
object or "test body" for the standard operational definition of the physical 
property (i.e. "momentum", "energy", "position", etc.) associated with the 
observable A. According to this definition, the property a A = a" is defined 
specifically in the context of an ideal measurement whereby the quantum state 
\ip) is determined to be an eigenstate of A with eigenvalue a. 

the mathematical formulation, the standard interpretation of the theory adds an 
additional postulate, the so-called completeness hypothesis pQ. This states that at 
any given time it is the quantum state \ip) which constitutes the ultimate description 
of the microscopic system. 

It is this hypothesis, in conjunction with the standard operational definition 
of the physical property 11 A = a" , which brings about one of the many well-known 
problems of interpretation in quantum mechanics. The problem has to do with the 
fact that while the property ll A = a" is attached "to" the measured system in the 
sense that it labels the state if A\tp) = a\ip), the property nevertheless refers implic- 
itly to the actual experimental arrangement by which the state was determined; this 
is in contrast to a classical description where similar properties are always regarded 
as being intrinsically "of the system" . The question of what it is about the system 
that is measured by the apparatus is therefore a very delicate one. 

Or stated in other words, it is hard to escape viewing the apparatus in 
the ideal measurement process as something of a transducer, i.e, as if its purpose 
were merely to raise to discernible levels an actually existing microscopic "signal" 
associated with the system. But this assumption is equivalent to the assumption that 
properties registered in an ideal measurement, say for instance the two possible spin 
components U S Z = +1/2" or U S Z = —1/2", are in fact intrinsic or "non-contextual" 
properties of the particle (see e.g, DeEspagnat 1 J). And it is this assumption which 
is problematic. 

The problem may seen as follows. Suppose for instance that in a measure- 
ment of S z it was 11 S z = 1/2" which was actually obtained. Then, it must be the 
case that if one measures S z again, the outcome will be, with certainty +1/2. In this 
sense then, one can say that the measurement determines a property of the system 
towards the future. But this is different from saying that one infers a property that 
existed beforehand. In fact, such inferences are meaningless according to standard 
quantum mechanics. For suppose that we had earlier measured S x , with outcome 
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1/2; then, from our later measurement of S z we could claim that both "S x = 1/2" 
and U S Z = 1/2" are true at the intermediate time. But this clearly contradicts 
the completeness hypothesis as no state vector can be simultaneously an eigenstate 



a S x = 1/2" which was defined in the intermediate time, and only when the state 
vector is "collapsed" by the measurement of S z does U S Z = 1" become a definite 
property. 

Thus we see that to strictly uphold the standard interpretation of the theory, 
means to give up the idea of inference in the ordinary sense, in other words, the sense 
in which we ordinarily tend to think of a measurement as "revealing" properties of 
the system. Instead, one is forced to introduce in the description of the system an 
irreversible and discontinuous element, the famous "collapse of the wave function". 
And the converse implication follows: to develop an inferential framework in which 
the results of the measurement are seen as having to do with "actual" properties 
of the system, one must go beyond standard textbook quantum mechanics, i.e., to 
non-standard interpretations. 

The non-standard framework on which our model is based emerged from a 
proposed solution to the "collapse" problem by Aharonov, Bergmann, and Lebowitz 
0. In 1964, the authors noted that Quantum Mechanics already contains the seeds 
for a time-symmetric interpretation in which the microscopic irreversibility associ- 
ated with the "collapse of the wave function" could be eliminated. This proposal 
was based on the interesting observation that the complete initial conditions en- 
coded in the quantum state are not the most restrictive conditions that can be 
used to delimit a sample of quantum systems at a given time t; for the purpose 
of retrodiction, the sample may further be delimited by using final conditions, for 
instance the result of a subsequent measurement performed at times later than t. 

For example, suppose that it is known that at two subsequent times t\ and t<i 
{p2 > t\) complete ideal measurements were performed on a system. The outcomes 
of these measurements are described by two state vectors and |V>2) respectively. 
If it is also known that at an intermediate time t an ideal measurement of A was 
performed (and assuming that otherwise the system was free), then the conditional 
probability distribution for the outcomes of this measurement is given by 



where 11(a) is a projector onto the eigenspace with the eigenvalue a, and U is the free 
evolution operator of the system. To cast this in a time-symmetric form, one defines 



of both S x and S z . Instead, according to the standard interpretation, it was only 
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two state vectors, propagated from \if>i) and ^2) to the intermediate measurement 
time t. The first is the usual time-evolved initial vector 



;i.2) 



while the second is the final vector evolved backwards in time, 

t 



\ih;t) = u(t,t2)\ih 



In terms of these two vectors, 



P(a\ifaih;t) 



|^ 2 ;t|n(o)|V> i; t)| 2 

ZM2;t\iL(a>)WrM 



(1.3) 



(1.4) 



This form shows then that the probability formula for retrodiction involves two state 
vectors which may be attached to the system at the time t, with respect to which 
it is time-symmetric (i.e., under the exchange \ipi;t) «-> \ip2'>t))- The non-trivial 
feature in this formula is that the probabilities are not necessarily equivalent to 

probabilities derived from a single state vector according to the Born interpretation, 

2 

i.e., P(a\ijj;t) = H(a)\ip;t) |4j. In other words, there is generally no single state 
vector \tp;t) such that P(a\ip;t) = P(a\^2ipi;t). 

Thus, in contrast to classical mechanics where a probability statement based 
on mixed boundary conditions (i.e., initial and final) may always be recast in terms of 
initial conditions only, in quantum mechanics initial and mixed boundary conditions 
are inequivalent with respect to the probabilistic statements they entail. It was 
argued therefore that quantum theory could be formulated in terms of the more 
basic notion of the pre- and post-selected ensemble labeled by both initial and final 
conditions. 

It was this idea which later gave rise the so-called Two- Vector formulation 
of Aharonov, Vaidman and Reznik [SJ H3 [7], according to which the reality of the 
system at a given time t is described not by one but rather by the two state vectors 
\ipi]t) and \ip2',t)- As in the standard interpretation, the forward-evolving \tpi;t) 
represents the outcome of a prior complete ideal measurement at a time t\ < t; 
in this interpretation, however, this vector contains only "half of the story". The 
remainder of the story is given by the backward-evolving vector \ip2',t), which can 
only be determined a posteriori from the outcome of a complete ideal measurement 
on the system performed at a time i 2 > t (see Fig. 11.1(1 . 

It seems therefore that in this formulation, it should indeed be possible to 
assign simultaneous properties to two non-commuting observables, for instance, in 
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STANDARD 



TWO-VECTOR 




! 





^ ) 





Figure 1.1: Description of the system at a time t according to the Standard vs. 
the Two- Vector Formulations. The solid horizontal lines represent complete ideal 
measurements. The lightly shaded regions represent information that according to 
each of the formulations is irrelevant for the description of the system at the time t. 
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the case considered earlier of two successive measurements of S x and S z , where 
\ipi;t) corresponds to "S x = 1/2", and \ifo;t) to "5 Z = 1/2". This however still 
leaves the question open as to how to give a non-trivial operational meaning to 
statements such as U S X = 1/2 and S z = 1/2" at the intermediate time t. 

One possibility is of course to consider ordinary ideal measurements of S x or 
S z that could have been performed at this intermediate time. In this sense, it is clear 
that given the two boundary conditions, had one also measured S x at time t then the 
outcome certainly must have been +1/2. Similarly, had one measured S z instead, 
then the outcome must also have been +1/2, with certainty. But what about a 
joint measurement of S x and S z ? Or say a single measurement of the component 
(S x + S' 2 )/v / 2, which would seem to be well-defined except that the "inferred value" 
is the impossible value l/\/2! 

Such questions demand a closer examination into the actual dynamics of 
the measurement process and in particular the general notion that in quantum 
mechanics, a measurement is accompanied by a disturbance of the system. This 
notion may be argued from simple complementarity [2] arguments, which suggest 
how the conditions on the apparatus which define what is "ideal" about an ideal 
measurement -namely that they yield precise readings, entail conditions which are 
far from ideal from the point of view of the back-reaction effects on the system. 

For concreteness, suppose one wishes to measure the spin component S x of 
an atom as in a Stern-Gerlach experiment, by imparting an impulse 5p = g S x to the 
momentum p along the x direction (where g is a coupling constant). This momentum 
plays the role of the "pointer variable" of the apparatus. An effective Hamiltonian 
describing the coupling between the two degrees of freedom is then H = —g5(t)xS x , 
which simulates a brief passage of the atom through an inhomogeneous magnetic 
field with a linear gradient in the x direction. This coupling, however, also describes 
a back-reaction effect on the spin, namely the precession of the angular momentum 
vector around the x-axis by an angle 59 = gx. Now, as in an ideal measurement 
one would need to define p to an accuracy Ap <C g, then its complementary variable 
x must be uncertain by an amount Ax 3> g^ 1 - This entails however that the 
uncertainty in the rotation angle is already A9 S> 1, i.e., of an order greater that 
one complete revolution (see Fig. 11.2)1 . 

The argument illustrates therefore that the defining conditions of the appa- 
ratus necessary for an ideal measurement of a spin component simultaneously entail 
a de-phasing condition: the "washing out" of angular momentum information sen- 
sitive to a rotation around the measured spin axis. It seems therefore that in order 
to probe non-trivial aspects of quantum mechanics which may seem natural from 
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Figure 1.2: A classical picture of de-phasing in a Stern- Gerlach apparatus. The 
gradient in the magnetic field induces a differential rotation in the spin components 
perpendicular to the x axis. If the uncertainty in gx is of order tt, this angular 
information is lost in the sample. 
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the point of view of the two- vector description, one must resort to alternative inter- 
mediate measurement procedures where the connection between the two vectors is 
not broken by this de-phasing action of the apparatus. 

It was this insight which lead the group of Aharonov to consider the scheme 
of weak measurements, from which the concept of weak values ultimately emerged. 
The weak measurement scheme differs from that of ideal measurements in that 
instead of controlling the apparatus "pointer variable" p so as to ensure a precise 
reading in a single trial, it is now the dispersion Ax in the complementary variable 
x which one seeks to minimize so as to ensure a minimal back-reaction. Thus, for 
instance, the mutual disturbance entailed by a pair of measurements of two non- 
commuting observables may be controlled if one sacrifices the statistical significance 
of a single reading of the pointer variables. This cost is easily offset in the long- 
run; the systematic effects on the pointers may still be recovered when the weak 
measurement is performed independently on each member of large enough sample 
of similarly conditioned systems, i.e., as in a so called "precision measurement". 

Now, when developed within a purely quantum description, what the analysis 
of weak measurements revealed was the remarkable way in which the apparatus 
should respond systematically to those systems that happen to fulfill the initial and 
final conditions prescribed by the two vectors \ipi;t) and \tp2',t). For instance, if the 
initial and final states are such that a S x = 1/2 and S z = 1/2" respectively, then 
indeed weak measurements of (S x + S z )/y/2 register the "impossible" value l/\/2\ 
[HI El (Fig I1.3JI . More generally, on a sample of systems pre- and post-selected in 
the states \tpi\t), and \ip2',t) respectively, the average displacement of the pointer 
variable in a weak measurement of A is given by 

(5p) = ReA w (t) (1.5) 

where A w (i) is the weak value of A 

(y? 2 ;£|i|^i;t) , , 

A w{t) = . , Tv • (1-6) 

The imaginary part of A w (t) can also be related in the context of weak measurements 
to a change of order Ax 2 in the expectation value of the complementary variable x. 

The most salient feature of the weak value is therefore that as opposed to the 
standard expectation value (ipi;t\A\ipi;t), its real part may take values outside the 
spectrum of A if such spectrum is bounded [Sl llUl lS]. Thus may follow any number of 
non-intuitive results if the weak value is viewed as some sort of "posterior average" 
of the eigenvalues of A. Instead, in the context of weak measurements, weak values 
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Figure 1.3: Geometry of Weak Values for a Spin-1/2 particle (real part shown only). 
The polarizations of the initial and final states are u and v. The real part of the 
weak spin vector S w bisects the angle between the two directions and its length is 
such that onto each of these directions the projection is 1/2 (in units of h). In a weak 
measurement of the spin component along some arbitrary direction A, the average 
kick on the apparatus, from a sample satisfying the two boundary conditions, is 
then S w ■ a. 
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provide a new way of interpreting the standard expectation value. This is based 
on the fact that the small disturbance condition entails that the probability of a 
transition \(ip2; t\ipi) t)\ 2 between the initial and final state is practically unmodified 
by the presence of the measurement. The standard expectation value of A, which 
is the observed mean value of (dp) on the pre-selected sample, can therefore be 
understood as an average of weak values: 

<iM|A|iM> = £ KiMiM>l 2 x y'fwf 

where the sum runs over the final states defined by the post-selection. This sum rule 
shows that while in general the weak value will take values outside of the spectrum 
of A, exceptionally large weak values are registered only under equally exceptional 
or unlikely conditions; in other words, the most likely weak values are still the ones 
falling within the ordinary range of expectation. But more importantly, the sum rule 
suggests that the weak value may be interpreted as a more basic definite property 
of the system, only that it is generally uncertain a priori, i.e., to the extent that 
the "destiny" of the system, as defined by the final state \1p2\t), cannot be known 
in advance. 

Returning then to the previously mentioned problem of inference posed by 
the standard interpretation, we thus see that the two vector-formulation, in con- 
junction with the scheme of weak measurements, suggests an attractive solution, 
the "twist" of which is lies in the separation between the measurement procedures 
by which the two concepts of "state" and "physical property" are to be defined 
operationally: 

1. according to the two- vector formulation, the most basic ensemble to which 
the system may be assigned at a time t is the pre- and post- selected ensem- 
ble defined by the outcome of two complete ideal measurements, which is is 
truly the maximal ensemble in the sense of both prediction and retrodiction. 
Such are the ensembles described by the two state vectors \tpi;t) and \ip2]t}- 
The role of ideal measurements in establishing the connection between statis- 
tical ensembles and the concept of state is thus preserved as in the standard 
interpretation. 

2. However, in contrast to the standard interpretation, the operational definition 
of the physical property associated with the observable A is to be grounded 
on weak measurements, i.e. from the weak value of A Jl]. This presents no 
contradiction to the standard definition of 11 A = a" , when the initial state is an 
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eigenstate of A; in such cases the weak value is well-defined and coincides with 
the eigenvalue a. But since weak measurements hardly disturb the individual 
system, i.e., the state is not "collapsed", the weak value retains its operational 
meaning even in the context in which other observables are measured weakly. 
It is this fact that allows weak values to be regarded as intrinsic properties of 
the system. 

1.2 Statement of The Problem 

The idea of formulating the model presented in this dissertation emerged from a 
question that has been troubling me for a couple of years: 

In what sense can the weak value of A be interpreted as a definite mechanical 
effect of the system on the measuring apparatus? 

This question was prompted by the fact that when the weak measurement 
scheme is analyzed quantum mechanically, it is also possible to view the unusual 
effects of weak values as something of a mathematical curiosity-an atypical way in 
which certain wave functions describing the apparatus, shifted by the eigenvalues of 
A, happen to interfere so as to yield something that appears to be a "kick" of the 
apparatus pointer variable p by the weak value. The impression of a "conspiracy in 
the errors" is only heightened by the fact that the statistics that show weak values 
are the ones where an additional final condition is controlled on the system, so it 
also legitimate to wonder whether at the level of probabilities, Bayes' theorem plays 
a role in this conspiracy. 

My first attempt at an answer was to look at these effects by drawing parallels 
with a classical Bayesian analysis of the measurement scheme. The result of this 
was that weak values could be interpreted as posterior averages of some quantity 
11 A" , attached to the system, but only if one uses negative probabilities to account 
for the interference terms as in the Wigner representation. This however, turned 
the problem of interpreting weak values into the much more abstract problem of 
interpreting non-standard probabilities |12j . and so I finally gave up on this route. 
Fortunately, two useful leads did come out of this parallel with the classical situation: 

First came an awareness of the importance of the variable x conjugate to 
the apparatus pointer variable p, which drives the reaction back on the system. 
As it turns out, when in the classical case one is interested in predicting the data, 
information about this variable is irrelevant. However, the variable becomes entirely 
relevant when the data is analyzed in retrospect, against initial and final boundary 
conditions on the system; prior knowledge of this variable then enters into our a 
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posteriori inferences about both a) the state of the system that is sampled in a 
measurement and b) the state of the apparatus before the measurement started. 
This convinced me that there was something qualitatively important about looking 
at the measurement process given two boundary conditions on the system, as it is 
then when one expects the data to show an imprint of the back-reaction on the 
system entailed by the variable x. 

Secondly, it also became obvious from the Bayesian analysis that what one 
calls an inference about the system in the measurement process is strictly tied to the 
underlying model one has for the data. What may then seem contradictory from the 
point of view of one model may be entirely plausible from the other. This lead me to 
suppose that perhaps the entirely different apparatus conditions for ideal and weak 
measurements entail, in parallel, qualitatively different dynamical conditions in the 
measurement interaction, and that in turn, these differences should be interpreted 
in terms of two different effective models for the data. 

With the two above leads a general scenario emerged, which will be described 
in full in the coming chapter: 

When the apparatus pointer- variable statistics are analyzed in the light of 
fixed initial and final (complete) boundary conditions, a clear distinction emerges 
between two ideal extremes depending on the initial preparation of the apparatus. 
Each extreme corresponds to a deliberate "control" on the part of the experimental- 
ist aiming at optimizing either side of the disturbance vs. precision trade-off entailed 
by the uncertainty relations AxAp ~ 1/2. Correlatively, it is possible to associate 
with each extreme a linear statistical model of the form 

Pf =Pi + A (1.8) 

that describes the resultant conditional distribution of the data in terms of "kicks" 
proportional to A: in the case of sharp p, what we shall call the standard linear 
model (SLM), in which the U A" takes values on the spectrum of A; in the case of 
sharp x, a weak linear model (WLM) in which "A" is the real part of the weak value 
A w . 

The fact that the two models are applicable in either extreme can be argued 
as a consequence of two different conditions by which it seems reasonable that the 
distribution of the data may be separated in terms of variables attached to the 
system or the apparatus respectively. In the "strong" extreme Ap — ► =>• Ax — > oo, 
these conditions can be tied to de-phasing, the loss of phase information in the data; 
in the weak extreme Ax — > =>■ Ap — > oo, the conditions can be tied to physical 
separability: the almost complete absence of entanglement between the system and 
the apparatus. 
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In between these two ideal extremes lies the "limbo" of non-ideal measure- 
ments where neither model is applicable; from within the perspective of the two 
above ideal extremes, this corresponds to the fact that neither has an effective 
de-phasing been achieved as required for the SLM analysis, nor has the necessary 
degree of "weakness" or physical separability been achieved as required for the WLM 
analysis. When viewed from this perspective, the "limbo" region should hence be 
of considerable interest when analyzed in the light of final boundary conditions as 
the non-separability of the conditional data may then be interpreted as the signa- 
ture of the intrinsic quantum mechanical non-separability of the apparatus-system 
composite at the time of the measurement interaction. 

For instance, it may seem reasonable to expect that in moving from one ex- 
treme to another within the parameter space of measurement strength, i.e., Ax, one 
should encounter in the limbo region an intermediate transition regime separating 
two regimes in each one of which the data is approximately captured by either of 
the two descriptions. One may then speculate that this transition in the description 
of the data is a signature of something analogous to a phase transition, an underly- 
ing qualitative change in the actual physics of the measurement interaction as one 
moves from one regime to the other in the strength parameter space. 

Now, there is of course a way of describing the limbo region based on the 
probability amplitudes from which the conditional distributions of the data are ul- 
timately derived. At present, however, the sense in which the interference patterns 
are understood is based on the spectral decomposition of A. Such a description 
may be appropriate in a strong regime, where approximate statistical separability 
is possible under the SLM, but it fails to do justice to the overall qualitative be- 
havior exhibited in the weak regime, where the mass of the resultant conditional 
distribution of the data may lie well outside the prior region of expectation. 

What is missing therefore is a picture at the level of probability amplitudes 
that "sharpens" as the ideal conditions for statistical separability under the WLM 
are approached, in other words, that sharpens with the complementary variable x 
of the apparatus. 

1.3 Summary of Results 

The aim of the model proposed in this dissertation is then to provide this comple- 
mentary description. The idea is that the WLM, or a linear statistical model based 
on weak values, can be approached from the point of view of a quantum analog of 
a non-linear classical model in which a picture of "sampling" weak values is always 
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at the forefront. 

As we shall see in Chapter 3, it is possible to establish, by turning the 
emphasis towards the complementary variable x of the apparatus, a clear criterion 
by which the real part of the weak value can be regarded as a definite kick of the 
pointer variable. This can be shown by considering narrow "sample" test functions 
of the apparatus in which the support in x is bounded. In that case, the shift in the 
conjugate variable p can be seen to be in direct correspondence with a phase gradient 
as in ordinary wave mechanics. Furthermore, by changing the location of the sample 
along x, the response of the pointer is given by different "local" weak values, each one 
corresponding to a different pair of initial and final states parameterized by x. Thus 
one obtains a picture where as the location of the test function is varied, one samples 
a different configuration of the system. The distribution of the data for an arbitrary 
apparatus preparation may then be understood as the resulting interference pattern 
when samples at various locations in x are coherently superposed, what we call a 
superposition of weak measurements. 

A more delicate question involves the interpretation, in the non-weak regime, 
of what in the weak regime corresponds to the imaginary part of the weak value. It 
is this component which in the model is associated with the Bayesian aspect. 

The insight into this association is developed first in Chapter 4, where we 
consider the classical probabilistic analysis of the measurement with two boundary 
conditions on the system. This analysis shows how the posterior distribution of the 
classical pointer variable acquires a non-trivial dependence on the prior distribution 
in its conjugate variable x. This dependence has to do as mentioned earlier both 
with the region of the system's phase space that is sampled, as well as with a re- 
assessment of the probabilities for possible initial conditions of the apparatus. This 
dependence is summarized in terms of what is known as a likelihood factor, which 
describes the passage from prior to a posterior probabilities given the conditions on 
the system. 

From the classical analysis we then develop in Chapter 5 the quantum anal- 
ysis by drawing both on a formal correspondence as well as a quantitative corre- 
spondences that one should expect in the classical limit. The semi-classical analysis 
shows that the real part of the local weak value corresponds in the classical limit 
to the classical response of the apparatus given a definite value of x. Moreover, 
in the semi-classical analysis one can also establish for the quantum case, a direct 
correspondence with the classical likelihood factor. The model is then developed 
for more general boundary conditions by drawing a correspondence with the semi- 
classical case. The two elements of the model are then the local real part of the 
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weak value, which is a non-linear function in x, and the likelihood factor. These 
two elements provide an intuitive way of understanding the two foremost statistics 
of the data, the mean and the variance. We obtain some new results in connection 
with such "error laws". 

Furthermore, the picture that emerges is that one samples different weak 
values, corresponding to different configurations of the system, but the a priori 
sampling weights are modified by the likelihood factor. The weak linear model 
is then recovered when the "sampling distribution" in x is sharp enough that the 
uncertainty in the sampled weak values is small. In that case, the likelihood factor 
entails a small shift of the a priori distribution in x, which is then connected to the 
imaginary part of the complex weak value. 

However, as the width in x is increased, the likelihood factor produces qual- 
itative changes in the sampling distribution. In Chapter 6 we explore this phe- 
nomenon for those cases where an unlikely combination of boundary conditions 
yields "eccentric" weak values. Those cases can be connected to an interesting 
phenomenon in Fourier analysis known as super-oscillations, where the phase of a 
function oscillates in a certain interval more rapidly than any one of the component 
Fourier modes. However, as super-oscillations are exponentially suppressed in am- 
plitude, this translates in the model to regions in x where the likelihood factor is 
at a minimum or close to a minimum; the tendency of the likelihood factor is then 
to "widen" the sampling distribution. What happens then is that as the strength 
parameter Ax is increased away from zero, at some critical value the sampling dis- 
tribution shows a behavior analogous to a phase transition, where it goes from a 
single-peaked to a double-peaked function. In the reciprocal space of the pointer 
variable, the transition corresponds to the shift of the expectation value from the 
"eccentric" region to the normal region of expectation, accompanied by "beats". 
We give an example where the beats are directly connected to the spectrum of the 
observable A. 
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Chapter 2 



Preliminaries: Standard and 
Weak Linear Models 

In this chapter we introduce the general setting in which we would like to place 
our non-linear Bayesian model of non-ideal measurements. Associated to any mea- 
surement scheme is a some statistical models, constraint equation allowing us to 
connect the data to the properties that are to be inferred from the measurement. 
The well-known von-Neumann ^H] measurement scheme is perhaps the simplest 
caricature of a measurement interaction and leads to the simplest possible model: 
the linear model. It turns out that this model, which we shall henceforth refer to 
as the standard linear model, is consistent with quantum mechanical predictions to 
the extent that the statistics are analyzed against initial conditions only; moreover 
it is consistent under very general non-ideal conditions on the apparatus. However, 
the model may fail when the statistics are controlled for the most restrictive type 
of conditions that can be imposed on the measured system, namely initial and final 
conditions. It is this failure that gives room to the unexpected effects associated 
with weak values, and which suggests that an alternative interpretation of the data 
may be in order. 

2.1 The von Neumann Scheme 

The von Neumann measurement scheme consists of an interaction between two 
initially unentangled systems, the "system" proper and an external apparatus. The 
aim of this interaction is to produce an effect on the apparatus from which to infer the 
value of some observable A of the system. The distinction between the two systems 
follows from the underlying assumption that the "system" is generally microscopic 
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while the apparatus is either macroscopic, or else satisfies certain classical properties 
expected of a macroscopic object, in which case the measurement is called an indirect 
measurement. One such property is for instance that the mass be large enough 
that quantum inertial effects (i.e., wave-packet spreading) can be neglected on the 
side of the apparatus, at least for the duration of the measurement interaction. 
The apparatus is then idealized as a system of infinite mass with a vanishing free 
Hamiltonian, described by a pair of canonically conjugate variables x, p, ([x,p] = i). 
We distinguish the variable p as the pointer variable, the observable on which the 
effect is analyzed and from which the datum is ultimately obtained. In addition, we 
shall also refer to the conjugate variable x as the reaction variable, for resons that 
will become evident shortly. 

The simplest dynamical model of a von-Neumann interaction is described by 
the impulsive Hamiltonian operator 

H M (t) = S{t-t l )Ax, (2.1) 

coupling A to the reaction variable x, where the delta-function models the fact that 
the interaction time is negligible compared to that of the free evolution of the system. 
What distinguishes this type of coupling is that the impulsive unitary operator 

exp(-t J dtH M (t)) = e iM (2.2) 

which is therefore defined induces in the Heisenberg picture a linear transformation 
of the pointer variable operator 

p f = e- iAx p i e iAx =p i +A. (2.3) 

Were one to drop the hats, this equation would be interpreted classically as a "kick" 
of the pointer variable proportional to the value of a A" . In such case, the value of 
U A" could then be inferred from the impulse imparted to the apparatus. 

The archetype of such scheme is provided by the Stern-Gerlach apparatus 
(see Fig. 12.1(1 . in which case A stands for a given spin component, i.e., S x , and x 
stands for the translational coordinate of the particle along the direction parallel to 
the spin component. The spin component is then determined from the asymptotic 
deflection of the particle which in the limit t — > oo is proportional to the imparted 
impulse. We should note that a possible coupling constant, which for instance in the 
S-G device would involve the product of the gyromagnetic factor and the magnetic 
field gradient, can always be absorbed in a canonical redefinition of x and p. Other 
examples of such linear von-Neumann setups can be found in |19j . 
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Now, the conditions under which the above classical analysis of the datum 
can be performed in a single realization of the measurement correspond to what 
we shall henceforth refer to as a strong measurement, or what is commonly known 
as the "ideal" realization of the measurement. This means that the initial state of 
the apparatus is sufficiently well-defined in p that the value of ll A" can be inferred 
precisely from the displacement 5p = pf — p%. As is well known, the possible "kicks" 
are then the eigenvalues {a} of A, which occur with probability 

P{a\p s ) =Tr[fl(a)p s ] (2.4) 

where 11(a) is the projection operator onto the corresponding eigenspace and p s is 
the density matrix describing the initial preparation of the system) . 

2.2 The Standard Linear Model 

In more realistic "non-ideal" situations, the initial state of the apparatus will have a 
finite and perhaps considerable dispersion in p. Strictly speaking then, the classical 
picture of "kicks" proportional to the eigenvalues of A should no longer be applicable. 
However, it is easily shown that if the initial states of the system and apparatus are 
physically separable, i.e., no entanglement, then even in less than ideal circumstances, 
the predicted distribution of the data is still statistically separable under the c- 
number linear model 

Pf=Pi + a, (2.5) 

which we shall here refer to as the "standard linear model" or SLM for short, in which 
Pf is the datum, pi plays a role analogous to the "noise", and the "signal" a -the 
target of inference- takes values on the eigenvalues of A. By statistical separability 
we shall mean that the resultant distribution of the data can be decomposed, in 
terms of a number of additional conditions, so that pi and a can be treated at some 
level as if they were independent random variables, in this case attached to the 
apparatus and the system respectively 

Consistency of the predicted distributions with the SLM follows from the 
equivalence between the Heisenberg and Schrodinger pictures and the assumption 
of physical separability. To see this, consider first the case we shall keep in mind 
throughout this dissertation, that of a pure preparation in which the system and 
apparatus are prepared in a factorable state = <8> |</>j) where |Y>i) is the 
initial state of the system. With the measurement interaction, undergoes the 
transformation 

m = |Y>1> ® \<t>i) - |*/> = e lA£ \V l ) . (2.6) 
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Figure 2.1: Schematic of a Stern-Gerlach apparatus for the measurement of a spin 
component, in this case illustrating the measurement of the S^-component of a spin- 
1 particle. Directions of spin not perpendicular to the beam path may be measured 
by passing the beam through a uniform magnetic field oriented in such a way so as 
to produce a desired rotation of the spin axis relative to the measured axis. The 
data is obtained from the vertical position of the spot on the screen. In the ideal 
case, only three spots are seen, always aligned with the direction of the S-G magnet. 
In the non-ideal realization illustrated here, the dispersion in the data is so large 
that the eigenvalues are barely distinguishable. 
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The probability distribution for the data is then 

dP(p\Vf) = dp(9f\6(p-p)\Vf). (2.7) 

Now use the Heisenberg picture transformation Q2.3|) and the spectral resolution of 
A to obtain 

dP(p\* f ) = dp{^i\e- iA& S{p-p)e iM \^i) 
= dp(^i\5(p-p- A)\^i) 
= £(^i|n(a)|Vi) dp^p - p - a)\&) 

a 

= J2 p ( a \^) dp (p~ a \&)- ( 2 - 8 ) 

a 

From this equation we observe that the distribution of the data takes the form of a 
"broadened" version of the spectral distribution P(a\ipi)- the convolution of P(a\tpi) 
with a probability distribution for the "noise" dP(p\(f>i). To illustrate this, we show 
in Fig. 12.21 the resultant distribution for a spin-1 measurement with three values 
of the uncertainty in p. Fig. 12.31 then shows how in the non-ideal cases, where the 
peaks of the spectrum are not resolved, it is still possible to view the distribuition 
as a sum of broadened spectral distributions. 

It is this form which underlies the fact that even if the uncertainty in the 
noise is large but its probability distribution is known, then after a large number of 
independent and identical realizations of the measurement one may still determine 
properties of the spectral distribution from the observed frequency distribution of 
the data. For instance, if we know the initial mean value (p)i of the pointer variable 
and its variance Apf, we may then use the "error" formulas which stem from the 
SLM 

(P)/ = (P>» + (o> 

Apj = Apf + Aa 2 (2.9) 

to connect the observed means and variances in the data with the standard expec- 
tation value of A and its variance 

(a) = ^P(a|^)a = (Vi|i|^i) 

a 

Aa 2 = ^P(a|^i)(a-(a)) 2 = (Vi|i 2 |Vi)-^i|i|^i) 2 . (2-10) 

a 

More generally, the spectral distribution can be extracted by performing a deconvo- 
lution on the frequency distribution of the data (although for noisy data the problem 
is not entirely without complications, see e.g., |13j). 
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Figure 2.2: Pointer variable probability distributions for three values of the "noise" 
level in the apparatus preparation. In the three cases, the system is a spin-1 particle 
prepared in an eigenstate \s = l,m s = 1) of S z ; The measurement is of the spin 
component S u , along a direction u = sin(7r/3)e x + cos(7r/3)e z ; the apparatus initial 
state is a minimum uncertainy packet with a standard deviation a in p initially 
centered at p = 0. The case a = 0.1 illustrates the ideal situation in which the 
spectrum of S u is clearly distibguished. 
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Figure 2.3: Break-up of the non-ideal distributions in Fig. 12. 21 as a mixture of broad- 
ened spectral distributions. The spectral probabilities (s = l,m z = l\IL(m n )\s = 
l,m z = 1) are 1/16, 3/8,9/16 for m u = —1,0,1 respectively. These probabilities 
correspond to the areas under the three peaks in the ideal situation a = 0.1. In all 
three cases the expectation value of p over dP(p\^ /) is ^ and the variance is a 2 plus 
the variance of the spectral distribution, = 3/8. 
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Another equally instructive way of seeing the consistency of the SLM is by 
expanding the combined state initial state in an eigenbasis of A, i.e., 



J2(a,g\^i)\a,g) 



® \<Pi) (2.11) 



where g stands for some additional degeneracy index. The combined final state may 
then be written as 

|*/> = E< a >*>kff>®|e ia *&> (2.12) 
a,g 

where \e mx (f>i) is \4>i) shifted in p by a. The distribution of the data can then be 
obtained from the resultant partial density matrix of the apparatus which is obtained 
by "tracing out" the states of the system from the projector Using the 

orthogonality of the basis {\a,g}}, one then finds 



Pa(*/) = EH <«>3lV>i> II 2 \e iai ^W ai H 

= E P W) \e m£ &)(e ia£ &\. (2.13) 



The partial density matrix describes therefore a mixture of shifted states. This mix- 
ture could have been generated, for instance, by applying unitary transformations 
e mx on the initial state of the apparatus, where the momentum shifts corresponded 
to some external random parameter a distributed according to the probabilities 

Finally, we note that statistical separability under the SLM ensues in the 
more general case is which the two systems are prepared in a mixed and classically 
correlated separable state of the form 

Pi = Y, P ^\E)p s {x)®Pa{x) (2-14) 

X 

where x ma Y be some external uncertain classical parameter. The predicted distri- 
bution of the data may then be decomposed as 

dP(P \pf) = E P ^\E) E P(a\Ps(x)) dP{p - a \p a ( X )) , (2.15) 

which is nothing more than a statistical mixture of broadened spectral distributions. 

Thus we see that in a von- Neumann linear measurement, and insofar as the 
combined initial state of the two systems is not entangled, the predicted distribution 
of the data is statistically separable under the SLM, i.e., a linear statistical model 



23 



in which the "signal" takes values on the eigenvalues of A. It may be tempting 
therefore to interpret this consistency as an indication of a wider range of validity 
of the classical dynamical picture pj = pi + a underlying the SLM -in other words, 
that even when the spectrum is not fully resolved it is still assumed that on every 
realization of the measurement the pointer variable suffers a definite (i.e. "real") 
"kick" a E Spec(A), except that the values of pi and a fluctuate statistically on a 
trial- by-trial basis. 

However, as we shall see shortly, it is indeed possible to distinguish certain 
populations of the system from which the distribution of the data is inconsistent with 
the SLM, and hence with the underlying physical picture. These are populations 
that are singled out according to additional conditions that the system may be 
made to satisfy after the measurement interaction, conditions which define the so- 
called pre-and post-selected ensembles mentioned in the introduction. We digress 
momentarily to develop the appropriate notation we shall use when dealing with 
such ensembles. 

2.3 Description of the Post-Selected Statistics in Terms 
of Relative States 

Let us then suppose that after our von-Neumann measurement of A, a second com- 
plete ideal measurement is performed independently on the system, the possible 
outcomes of which correspond to a complete orthonormal set of final states {|VVt)} 
with (ifin\ifi v ) = and l s = IVvKVvl- ^ n exam Pl e of how such a post-selection 
may be implemented for a Spin-1 particle is given in Fig. 12.41 Together with the 
fixed initial state \ipi), each \ip^) defines a pre-and post selected ensemble for the 
system that will be labeled throughout this dissertation by the index [i. We shall 
generally refer to such ensemble simply as a "transition" —* \ipnj, where it 

should always be understood that since in the interim time the system interacted 
with our apparatus, transition probabilities are not necessarily K^lV'i)! 2 ; instead 
they are given by 

P(%\*f) = <tf,| |VvXVvl I*/), (2-16) 
which we denote as the perturbed transition probabilities. Finally, and for simplicity, 
unless otherwise noted, we henceforth use the terms "conditional" and "uncondi- 
tional" in the sense of conditioning or not against the final outcome |^) of the 
post-selection. 

Now, referring to the states \^fi) and of Eq. (j2.6j) . a convenient way of 
keeping track of both the conditional and unconditional statistics is by means of the 
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Figure 2.4: Schematic of a Stern-Gerlach setup with a post-selection. The second 
magnet splits the beam into three additional components, here corresponding to 
three eigenstates of S y . A post-selected sample for the first measurement corre- 
sponds to the set of all those events which fall into any one of the three distinct 
regions along the y direction produced by the third measurement. Note that since x 
and y directions are perpendicular, the respective sets of canonical variables (x,p x ) 
and (y,Py) commute; hence, if these two translational degrees of freedom are initially 
uncorrelated, the two magnets implement independent measurements. 
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relative-state expansion of the combined final state \^f) defined by the final basis: 

\V f ) = y/P^f) \%) ® \4>f) , (2-17) 

where \<j>f ) is the state of the apparatus relative to the final outcome |^) 

\4>f) = i 1 (^le^l^i)l^)- (2-18) 

Note that P(ip^\^f j) can be obtained from the normalization condition of this state. 
The relative state \<t>^) encodes all the available statistical information about the 
apparatus, conditional on the specific transition \tpi) — > IV^t)- 

In turn, to obtain the unconditional statistics, one may take the partial trace 
of to obtain an alternative decomposition of the partial density matrix of 

the apparatus 

*(»/) = 5XiM*/) wfwfi ( 2 - 19 ) 

That this decomposition should yield the same density matrix as the one described 
by equation (|2.13j) is a good illustration of the fact that the break-up of a mixed 
state into a convex sum of pure states is not unique. What is, however, unique 
about this particular decomposition is that the components of the mixture can 
be distinguished a posteriori, in the sense that the corresponding statistics can be 
analyzed separately, using the information provided by the post-selection. 



2.4 Failure of the SLM Under Both Initial and Final 
Conditions 

Let us now consider the conditional probability distribution of the data which follows 
from a given relative state \<j>^) as given in Eq. 12.181 Resolving A, we see that 
\4>^) expands as a linear combination of momentum shifts of the initial state 

1^) = 1 £<^|n(a)|Vi> |e^) , (2.20) 

each shift proportional to one of the eigenvalues. This defines therefore a relative 
wave function in the p representation which is a coherent superposition of shifted 
wave functions weighted by generally complex coefficients 

4>f(p) cx^(^|n(a)|^i)4 M) (P-«)- (2-21) 
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The conditional distribution of the data, i.e., dP{p\4>^) = dp\{p\4>^}\ 2 , may thus 
be written as: 



{fl) dp Ea(VvJn(a)|v>i)<Mp-a) 



2 



dP{p\4>f) = : 2, (2.22 



I dp' Ea>(^\~tt(a'Ml)<t>i(p'-< 



where the normalization constant in the denominator is a re-expression of the per- 
turbed transition probability P(-0^| x I / /). 

Prom the form of Eq. (|2.22l) we can immediately see that the presence of 
interference terms of the form 

(^inC^l^iX^iinCaOl^^Cp-a^JCp-aO (a /a') (2.23) 

prevents us from reducing this equation to either the statistically separable forms 
of a convolution of a probability distribution in p and a probability distribution of 
the eigenvalues, as in Eq. (|2.8|) . or to a mixture of such forms as in Eq. (|2.15|) . This 
means therefore that the conditional distributions arising from the post-selected 
subsamples are generally not consistent with the standard linear model. 

Aside from the trivial case in which either or the |^) are eigenstates of 
A, the notable exception is when the no overlap condition 

(/H(p-a)<f>i{p-a')~0 (2.24) 

is satisfied. In this case, the conditional distributions do reduce to the separable 
form 

dP(p\4>f) = ]T P(a|V>iW) dP(p - alfa) , (2.25) 

a 

where P^IV'iVVt) is the conditional distribution 

P((#lW = KjygwiM , (2 . 26) 

presented in the introduction. The no overlap condition is of course the condition 
for the strong or "ideal" measurement where as mentioned previously the dynamical 
picture underlying the SLM is strictly applicable. 

On the other hand, if 4>i(p) is wide enough that the interference terms become 
relevant in (Eq. I2.22|) . the dynamical picture pj = pi + a is clearly inappropriate. As 
we show in Fig. Q2.5|) for the same example considered in Fig. (|2.3|) . the discrepancies 
may be quite dramatic. For instance, if the spectrum of A is bounded spectrum, the 
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Figure 2.5: Posterior breakup of the unconditional distributions dP{p\^> t) in 
Fig. 12.21 according to the results of a postselecting measurement of S v , where 
v = sin(27r/3)e* r + cos(27r/3)e 2 . Two manifestation of interference effects in the 
conditional distributions are a small bump at p ~ —1.3 for m v = 0, a = 0.35 and, 
more notoriously, that the lower quartile of the m v = 1, a = 0.75 distribution lies 
approximately at the upper boundary of this "allowed" region [—1, 1]. 
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central mass of the conditional distribution may lie outside the region of expectation 
defined by the SLM. 

What is interesting is that even in those cases we must nevertheless recover 
the separable form consistent with the dynamical SLM picture in the process of 
pooling the data from all the post-selected subsamples (this is also illustrated in Fig. 
12, 5|) . This is a consequence of the equivalence between the decompositions (|2.19[) 
and 1)2.13(1 of the partial density matrix p a {^ /) from which the unconditional data 
is obtained, which in particular entails the sum rule for the data 

Y J P{^f)dPip\4> { f) = dP{j>\^f). (2.27) 

This sum rule hides something of a "statistical decoherence" in the process of pooling 
of the data: substituting in Eq. (|2.22|) and noting that its denominator is the 
perturbed transition probability P(ip^\^f), we see that 

YtPty^ti&PWf) (2-28) 

may be written as 

E^P X^|n( a )IV'i><Mp-a) 

fi a 

I 

= E d P EE^i|n(a')l^)(^|n(a)|^i) MP - aWdp - a') 

fi a a 

t 

= dp X)E^iin(a') E IVvXVvin(a)IV>i> Up - *Wdp - «') ; 

a a fi 

now, using the completeness of the final basis iVvKVVil = L an d the completeness 
of the projection operators n a n'a = 5 aja 'II a , this reduces to 

dp EE^i|n(a')n(o)|Vi) Up - «)#(p - «') 

a a 

= dpJ2(HMa)\ipi)Mp-a)\ 2 

a 

= £P(a|^)dP(p -<#*). (2.29) 

a 

Thus we see that the interference terms in the conditional distributions add up to 
zero leaving only the incoherent terms, which are the ones yielding the separable 
form of Eq. (|2.8|) consistent with the SLM. 
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We should note then the non-trivial significance of the cancellations behind 
the sum rule ([2.27)1 : given the actual sequence of events of first reading the da- 
tum and then post-selecting, any features arising from the interference terms in the 
conditional distributions will be statistically indistinguishable a 'priori- against the 
background of the SLM- consistent unconditional distribution of the data dP{p\^> f). 
Thus, discrepancies with the naive dynamical picture underlying the SLM are most 
definitely not obvious. They are only revealed a posteriori-here in the literal chrono- 
logical sense-after binning the data using the trial-by-trial record of correlations 
between the readings and the outcome of the post-selection. 

2.5 Weak Measurements and Weak Values 

As mentioned in the introduction, in a weak measurement we seek to minimize 
the back-reaction on the measured system. It is easily seen from the measurement 
Hamiltonian (|2.1j) that this reaction is dictated by the variable x conjugate to the 
pointer variable; for instance, following the Heisenberg dynamics on the side of the 
system, one can see that an arbitrary observable B of the system is transformed as 

B f = e - iAx B ie iAx . (2.30) 

The aim is therefore to ensure that the dispersion in x should be small around x = 
so that if B is sensitive ( [B, A] ^ 0), then Bf ~ Bi. 

This aim may also be seen from the point of view of entanglement. As one 
can see, if the initial state of the apparatus \4>i) were a "perfect" eigenstate of x, 
i.e. \4>i) = \x = 0), then the measurement transformation (Eq. I2.6j) would leave 
the initial factorable state = \ipi) ® \x = 0) state unchanged. Thus, one may 
view the minimal dispersion condition as being close to the ideal situation in which 
the initial physical separability or no entanglement between system and apparatus 
is preserved. 

Now, as this aim can only be accomplished in general at the price of spread- 
ing the distributions in the conjugate variable p, the remarks made in the previous 
sections should then serve to underscore the relevance of the two boundary condi- 
tions in the statistical analysis of weak measurements. To wit, the unconditional 
distribution of the data from a pre-selected sample will show no unusual deviations 
from the SLM picture; it will only appear as a highly broadened spectral distri- 
bution. On the other hand, we should expect a considerable overlap between the 
shifted wave functions in the conditional distributions l|2.22|) of the post-selected 
sub-samples, and hence "hidden" deviations from the SLM dynamical picture. 
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What is interesting is that from these complicated interference effects a sim- 
ple picture emerges, whereby the conditional statistics appear to reflect a single, 
well-defined "kick", proportional to the real part of the weak value of A, defined as 

= (2.31) 

It was this fact, in conjunction with the defining conditions of weak measurements, 
which prompted the group of Aharonov and collaborators to propose that the weak 
value is an appropriate operational description of a system in between two ideal 
complete measurements. As in part the purpose of this dissertation is to provide a 
firmer grasp on the concepts of weak measurements and weak values, we shall here 
only give a cursory look at how weak values were originally derived and some of the 
unusual properties associated with them. 

Aharonov, Albert and Vaidman E] showed that if (tp^\tpi) 7^ 0, and if 
4>i(x) = (x\<fii) is "sufficiently narrow" (say about x = 0) in a sense to be clarified 
shortly, then an excellent approximation to the relative state \<j>j) of Eq. Q2.18JI is 

possible by retaining the first order term in x of the Taylor series expansion of e lAx 
\4>f) ~ 1 + iAxtyi) \4>i) , (2.32) 

and then re-expressing this in terms of the weak value as 



Xl ^). (2.33) 



Under this approximation, the relative state may then be thought of as the initial 
state shifted in p by the complex weak value Affl . 

Let us briefly discuss the conditions under which the above approximation 
can hold as it stands. As one can see, normalization of the relative state in the form 
of (|2.33j) shows that the perturbed transition probability is 



p(^|*/) = [|foM^ 



-ImA 



\4>i 



2 



(2.34) 



To ensure that the normalization is in fact possible, one demands therefore that 
the wave function (f>i(x) should fall-off faster than e~\ lmAw X L This ensures that the 
Fourier transform 4>iip) is an analytic function in p, at least in a strip surrounding the 
real p axis bounded by ±ilmA w , a fact which is then consistent with an expression 
of the wave function in p as 

^^cxfaip-AM). (2.35) 
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Moreover, the Taylor expansion demands that the higher "weak moments" should 
be small, for instance, 



(^\{A-A w f\^ 



, , A^«l. (2.36) 

Finally, as the fall-off condition must also be consistent with the Taylor expansion, 
the imaginary part should also be small compared to Ax, 

lmA w Ax <C 1 , (2.37) 

so as to ensure that the transition probability agrees with that obtained from the 
first order Taylor expansion. These conditions can then be met by making Ax 
sufficiently small if the fall-off criterion is simultaneously satisfied. If this is the 
case, then term of "weak measurement" is appropriate, as the transition probability 
is essentially the unperturbed transition probability 

P(^ f ) = \\(^ 1 )f + 0(Ax 2 ). (2.38) 

The above weakness conditions entail therefore that the effects associated with the 
imaginary part are of the same order as the weakness parameter Ax, and hence can 
be made as small as desired by minimizing Ax. These effects include a small shift 
in the mean value of the conditional distribution in x, dP(x\(f>^) 

(x) = -2Irm4^Ax 2 , (2.39) 

as well as corrections to the shape of the conditional distribution of the pointer 
variable dP(p\<f)^). 

If we neglect these effects, we can then see see that the conditional distri- 
bution of the data is given approximately by the initial probability distribution 
displaced by the real part of A$ 

dP(p \^f ] ) ~ dP(p - ReA^\4>i) . (2.40) 

It is this form which then suggests that in the ideal limit of weakness Ax — > 0, the 
pointer variable receives a well-defined "kick" proportional to the real part of the 
weak value. 



2.6 "Eccentric" Weak Values and Statistically Signifi- 
cant Events 

What is most surprising about this picture in light of the consistency with the 
SLM of the unconditional distribution dP(p I*/), is that the "kicks" may now take 
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arbitrarily large magnitudes, even beyond the range of spectrum of A if the spectrum 
is bounded [I3E303EI1- For example, let l^i) and be the coherent states |A) 
and | — A), eigenstates of the creation operator a with eigenvalues ±A respectively. 
Then the weak value of the occupation number operator N = a) a is 

Ar ro =<^!gp = -|Af, (2.4!) 

an impossible result under the SLM given that the spectrum of N is positive definite. 

These "impossible" displacements provide a beautiful illustration of quantum 
mechanical interference when analyzed as a superposition of shifted wave functions 
in p. Using the fact that |A) = e~' A ' 2 / 2 J2n A™/n!, the relative wave function in p 
expands as 

cp f (p) = J2 eHA ' fa{p - n) , (2.42) 

in other words, a convolution of the initial wave function with a negative Poisson 
distribution. As 4>i(p + n) varies slowly with n, the shifted wave functions will 
interfere destructively in the region where the envelope |A| 2p /T(p + 1) is approxi- 
mately stationary (i.e. p ~ |A| 2 ± |A|). The wave function fai(p) is reconstructed as 
— fa(p + W 2 ) i n the region where the interference is least destructive. 

The reconstruction of the packet may in fact happen in the tail regions (Fig. 
12. 6[) of (j>i(p) if |A| 2 3> 1, in which case the displacement 8p ~ — |A| 2 is larger than 
the minimum required standard deviation Ap ~ |A| by a factor of order |A|. Thus, 
it is indeed possible to achieve statistical significance in a single trial, conditioned 
of course on the extremely unlikely event that the appropriate transition actually 
takes place (for the coherent states P{— A|A) — e _4 ' A ' 2 ). 

At first sight, it appears that these significant effects pose a serious threat 
to causality, as it would then seem possible to do "fortune-telling": in other words, 
to obtain information about the final state from a single event, before the choice 
of the post-selection basis is made. There are in fact two conditions ensuring that 
consistency with macroscopic causality is nevertheless maintained: 

First of all, the fall-off condition of fa (x) resulting from the "weakness condi- 
tion" ensures that the Fourier transform fa (p) is an analytic function in the complex 
p-plane at least on a strip containing the whole real p-axis. Thus, at the time that 
the datum is read, the analytic information necessary to reconstruct the shape of 
the packet is already available everywhere in p j^] . 

Secondly, as mentioned earlier, any unusual features of the conditional distri- 
bution must be indistinguishable a priori, in other words, "covered" by the noise of 
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Figure 2.6: The net effect of superposing Gaussian packets shifted by positive integer 
values 5p = n, and weig hted by e -l A l 2 (-|A| 2 ) 2 /n! with A = 3, is a packet shifted by 
the weak value — |A| 2 = —9. The scale of the resultant packet is an indication of 
how rarely the appropriate boundary conditions are realized. Nevertheless, if the 
conditions are in fact realized the measurement is almost certainly guaranteed to 
yield a negative reading. 
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the unconditional distribution dP{p\^i)\ hence, the prior probability of finding p in 
the region of uncertainty around the unusual mean value, as an "error" , is already 
greater than the corresponding transition probability itself. 

We should also note that in the reciprocal x-space, the unusual effects cor- 
respond to a phenomenon in Fourier analysis known as superoscillations. This phe- 
nomenon will be discussed in more detail in Sec. in connection with our model. 



Returning to the conditional distribution of the data in the case of a weak mea- 
surement, i.e., Eq. (|2.4Uj) . what is interesting therefore is that the statistics can be 
approximately described in terms of an alternative linear model where the role of 
the "signal" A is now played by the possible weak values of A. Let us give therefore 
a preliminary formalization of this model. 

We define the Weak Liner Model, or WLM for short, as a statistical model in 
which the data from a von-Neumann linear measurement is viewed as arising from a 
displacement of the pointer variable proportional to the real part of the weak value. 
This weak value we take to be a definite property of every system belonging to a 
given pre- and post- selected ensemble described by complete boundary conditions. 
As we shall generally deal with cases where ^^jy has both real and imaginary 
parts, we adopt the convention that unless it is made clear from the context, the real 
part will be denoted generically by the symbol a M ; we shall then refer to a simply 
as the "weak value" . The model thus reads 



As we have done so far, the index /i labels the transition (i.e., the pre-and post 
selected ensemble) which may or may not be known to have occurred. This un- 
certainty is then quantified by assigning probabilities P^ to each of the possible 
transitions compatible with the information at hand. When dealing with averages 
over these transition probabilities, we shall find it useful to distinguish from the 
usual (..) averages within a given state. Transition averages will thus be denoted 
with a "bar" , so that for instance a stands for 



2.7 The Weak Linear Model 



p f =pi + a 



(2.43) 



where 




(2.44) 




(2.45) 
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Now, as it stands, the WLM is no more than a proposed way of interpreting 
the data, and in the same way that we saw for the SLM, one may expect that that its 
range of validity is quite limited. The claim is then that if the measurement satisfies 
appropriate conditions of weakness, where it may be supposed that the apparatus 
and the system behaved almost as separate entities, then the distribution of the 
data becomes approximately separable under the WLM. 

As a preliminary check of consistency of this claim, suppose that such weak- 
ness conditions can be made to hold for all the transitions — > defined by 
a particular post-selection. We should then be able to approximately interpret the 
unconditional statistics as a reflection of the "scatter" of weak values that follows 
from the dispersion in the possible final outcomes of the post selection. Since the 
weakness condition entails that the transition probabilities P^/JV'i) = KVvlV'i}! 2 
are left practically unchanged, the statement translates to a sum rule in the form of 
a convolution 



dP(jp\^ f ) ~ Yl dP(p - a^i) = dP(p - a\4>i) • (2.46) 

Vv 

Consider then unconditional expectation value of the data. According to the sum 
rule, this is given by 

Jp~f) = ( Pt ) + a. (2.47) 

Note that we now interpret the unconditional expectation value of the data 
as the "bar-average" (pj) of the conditional averages {4>^\pf\4>^), whereas (pi) = 
{4>i\p\(j)i) remains the same as the "noise" distribution is here assumed to be inde- 
pendent of the transition. Computing now the bar average of the weak value, 

= Re(Vil£lWWvl4V'i) 

= (V'lWi), (2-48) 

we indeed see that the mean displacement of the unconditional distribution is 
(tpi\A\ipi) , as we derived earlier in terms of the SLM. This illustrates how the stan- 
dard expectation value of A may be interpreted either as the expectation value of 
the spectral distribution defined by or just as well as the average of the weak 
values from the complete set of transitions defined by a particular post-selection. 



2(^1^1/ 
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Similar sum rules for higher moments cannot be interpreted exclusively from 
the "scatter" of weak values, but must take into account corrections to the transition 
probabilities and the widths of the unconditional distributions. Corrections to the 
sum rules will be examined more carefully in Chapter 5 in connection with the 
non-linear model. 

2.8 Summary and Motivation for the Non Linear Model 

Let us then summarize the general picture we have tried to present in this section. As 
we have seen, in regards to the functional form of the distribution of the data, there 
appears to be no qualitative distinction between ideal and non-ideal realizations of 
a von Neumann measurement of A when analyzed against initial conditions only 
(i.e., from a pre-selected ensemble); in either case the data can be interpreted under 
the SLM, i.e., as arising from the same spectral distribution, the only difference 
apparently being the amount of "noise" in the data. Furthermore, as the SLM is a 
c-number transcription of the Heisenberg evolution of the pointer variable operator, 
SLM consistency in the non-ideal case would naturally seem to imply the same 
dynamical picture of the ideal case. It is only when the data is analyzed against 
both initial and final boundary conditions that a clear distinction between ideal and 
non-ideal measurements emerges. The distinction is brought about by interference 
terms in the conditional distributions which do not show up in the unconditional 
distributions. These interference terms prevent the general statistical separability 
of the data under the SLM, except under an ideal apparatus preparation of sharp p 
in which case a no-overlap condition is satisfied. 

In contrast, there is the opposite weak regime of sharp x, where a "com- 
plementary ideal" is almost approached, namely that of physical separability or no 
entanglement between system and apparatus. In such case the interference terms 
are significant in the conditional distributions and the mechanical intuition behind 
the SLM picture is lost altogether. In exchange, however, an alternative picture 
emerges as the data becomes statistically separable under the WLM, in which the 
role of the signal is played by the weak value of A. Even though this signal may take 
values well outside the spectrum of A, it is nevertheless guaranteed by QM that the 
unusual systematic effects associated with weak values should remain hidden in the 
unconditional distributions as demanded by macroscopic causality 
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Chapter 3 



Sampling Weak Values: An 
Illustrative Example 



Our intention in this and the following chapter is to develop a preliminary intuition 
into the picture of "sampling weak values" that we wish to associate with the non- 
linear model. In this chapter, we introduce the concept of local weak values. The 
model itself will be developed formally in the Chapter 5. 

3.1 Classical Angular Momentum as a Weak Value 

Consider a free particle in two dimensional space prepared at a time t\ in some 
initial sharp state in momentum, for simplicity an eigenstate \k), and post-selected 
at a time ti in the position eigenstate \q), where q and k are vector- valued and 
canonically conjugate to each other. For the intermediate measurement we take A 
to stand for the orbital angular momentum operator in two dimensions 



Since the particle is assumed to be free, the free Hamiltonian commutes with L and 
the conditional statistics of the measurement will not depend on the intermediate 
time ti; we may therefore take U to be a time immediately before £2; furthermore, 
as \k) is an eigenstate of the free evolution, it acquires a dynamical phase factor at 
t = t2 which may be disregarded as it does not depend on the apparatus. It is then 
easy to see that for this pair of boundary conditions, the weak value of L is 



L 



— Qxky Qykx — q A k . 



(3.1) 



A 



<g|g A fc|fc> 



q A k . 



(3.2) 
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Thus, between t\ and the weak value of L coincides with the conserved classical 
angular momentum defined by q and k. 



3.2 Sampling A Real Weak Value over a Narrow Win- 
dow 

Our starting point will be to examine in some detail a canonical example of how such 
weak values are realized when the dispersion in the conjugate variable x is controlled, 
and as seen from the point of view of the x-representation. Recalling the definition 
of the relative state \4>f ) corresponding to a given post-selection, i.e., Eq. (|2.18jl . 

we see that in the x representation the relative wave function <j>f (x) = {x\(j)j ■ ) is 
a product of two terms 

<j>f{x) = {i> 2 \e iAx \i> 1 )<l> i {x). (3.3) 

For the boundary conditions in the present example, we then see that the wave 
function in the ^-representation may be written as the Fourier integral 

Hi?) = r— / -7F= e-^(q\e^\k)Ux) . (3.4) 

where we have dropped the transition index for simplicity. 

The viewpoint that we wish to emphasize henceforth is that the integration 
variable x parameterizes, as a back-reaction, a unitary transformation on the side 
of the system. The factor (q\e lLx \k) is then viewed as the transition amplitude from 
\k) to \q) mediated by an intermediate impulsive rotation of the system around the 
z axis by an angle x. As we can see, the signs are such that the unitary operator 
e iLx i nc i uces an active clockwise rotation by x when acting on a ket; perhaps it is 
therefore more convenient to view the rotation as an active rotation of the final state 



(q\e l = e~ \q) , in which case the argument q of the bra is taken to a new 
value q{x) = R(x)q where R(x) is the ordinary counter-clockwise rotation matrix 
in two dimensional space. The transition amplitude is then 

(q\e iLx \k) = {q(x)\k) = -"- e^ x > k , (3.5) 

following trivially from the inner product between q and k eigenstates. 

Similarly, from the viewpoint of the "reaction variable" x, the apparatus 
initial state (f>i(x) describes the prior experimental control on the back-reaction. 
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Consider therefore the wave function (fii(x) representing the tightest possible control 
on the back-reaction, namely, one from which the rotation angle x is ensured not 
to deviate by more than e from a mean angle x. This defines for us what we shall 
term a "window" test function, a square pulse of width 2e centered at x = x 

, , f ~m= if \x — x\ < e , 

<M*|ex)= yf . ; ' . (3.6) 

Its Fourier transform, which for simplicity we distinguish by the argument p only, 
is the well-known "sine" function 

, /T sin(pe) , 

&(p ei = W — e ipx . 3.7 

V vr pe 

In spite of the fact that the resultant probability distribution has an infinite vari- 
ance, a natural width is nevertheless defined by n/e as approximately 90% of the 
probability mass is concentrated within the central lobe bounded by p = ±ir/e. If 
the dispersion in the back-reaction is therefore constrained to be less than a full 
rotation, i.e., e < tt, it is then guaranteed that the "noise" will exceed the maximum 
required to clearly distinguish the integer-valued spectrum of L, i.e., ir/e < 1. 

It is under such small-angle conditions that the weak value of L becomes an 
appropriate description of the pointer variable response. As we can see, using ()3.5(l 
and (|3.6|) . and taking care of the normalizing factor, the relative wave function for 
the window test function may be written as 

(t>f(p\ex) = [ X+€ dxe~ ipx+iS(x) , (3.8) 



x — e 



where the phase S(x) is seen to be an oscillating function 

S(x) = q(x) ■ k = \q\\k\ cos(x + O ) . (3.9) 

Here, 8 is defined to be the angle from k to q. From the integral representation 
(|3.8|) . it is straightforward to derive an exact expression connecting the average shift, 
the expectation value of (p) f = (c/> f\p\<fi /) , with the phase gradient S'(x). For this 
one notes that since the support of the integrand is strictly bounded, 4>f(p\ex) must 
be an entire function with all derivatives defined; we may then use the replacement 
x — ► i-^ to pull the phase factor outside the integral and replace it for a differential 
operator 

e iS ( 4 i) . (3.10) 
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The action of this operator on p is then a self-adjoint operator with respect to the 
initial state \<f>i): 

e -^mpe iS ^)=p + S'{i^) . (3.11) 

Thus, the expectation value of the data reads: 

(p) f = (p> 4 + (S'ix)) (3.12) 

where (p)i = (4>i\p\4>i) = , and the average shift is the expectation value of the 
phase gradient over the sampled window: 

(S'ix)) = (falS'ix)^) = 1 r +e dxS'(x). (3.13) 

^£ Jx—t 

Using the trigonometric form of S(x) as given in Eq. (J3.9|) . we may further express 
this average as 

(S'(x)) = S , (x)— i (3.14) 
e 

which shows that a small angle condition on the sampling window 2e < 1, ensures 
that the average shift is essentially the phase gradient evaluated at the sampling 
point x. And finally, as one can then verify, this local phase gradient is in fact a 
weak value of L, 

Q't \ ■ d 1 / / \IU (<l( X )\L\ k ) fa 1C r\ 

S (x) = -i— log (q[x) \k) = , (3.15) 

ax (q(x)\k) 

namely in this case the classical angular momentum q(x) A k corresponding to a pre- 
and post selected ensemble where the final position eigenstate \q) is rotated by the 
angle x. 

Thus we conclude that if the window and centered around some entirely 
arbitrary "sampling point" x, and is sufficiently narrow so that it satisfies a small 
angle condition, then the average conditional displacement of the pointer variable is 
essentially a weak value, what we shall call a local weak value \(x) = S'(x), evaluated 
at the sampling point 

m * m 5 (9|e '" £|fc> . (3.i6) 

(«|e tti |jt) 

The point therefore is that if one looks at x as the angle parameter of the transfor- 
mation induced by L, then, as the transformation becomes a well-defined unitary 
transformation ~ e lLx by virtue of the small uncertainty Ax, then the local weak 
value evaluated at the mean angle determines the conditional response of the pointer 
variable. From this perspective, the "standard" weak value A = q A k may hence be 
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seen as the resulting shift in a special, canonical, weak measurement, namely one in 
which the sampling point approximately determines a null rotation of the system. 

It is worth remarking that while the aforementioned result concerns the rela- 
tion between the conditional expectation value of the pointer variable and the local 
weak value under small angle conditions, it does not say anything about the resul- 
tant shape of the pointer variable distribution obtained from <pf(p\ex). However, 
it is always possible to impose more restrictive conditions on the size e in order to 
ensure that the shift occurs with minimal distortion of the overall shape of the initial 
packet cpi(p\ex), and hence of the resulting conditional probability distribution (Fig. 

ED}. 

As one sees, the Fourier integral (|3.8|) shows an analogy with the propagation 
of a an almost-monochromatic beam through a dispersive medium, where x plays 
the role of the wave number and S(x) the dispersion relation. The relative wave 
function cf)f(p\ex) may thus be interpreted as the result of propagating the initial 
packet (j)i{p\ex) through this medium after unit time, in which case the local weak 
value corresponds to the group velocity. If e is therefore small enough that the 
non-linear behavior of the phase factor S(x) around the sampling point may be 
neglected altogether, then the integral ()3.8j) can be performed in a "group velocity" 
approximation, in which case the relative wave function is up to phase factors the 
initial wave function rigidly translated by the local weak value A (x) 

(f>f(p\ex) ~ &(p-A(x)|ex)e*[ 5 ®- A ( 4 ^ 

fe sin [ (p - X{x) )e] j [S ( x )~px] f ~ i 7 n 

vr [(p-X(x))e] ■ (3 - 17) 



Expanding the phase as 



S(x) + X(x)(x + -X'(x)(x - xf + ... , (3.18) 

we see that the linearity condition is ensured if 

X'{x)e 2 < 1. (3.19) 

While for small angular momenta (|<7||A;| <C 1) linearity is essentially guaranteed by 
the small-angle condition on e, for \q\\k\ 3> 1 linearity demands a much tighter con- 
trol of the dispersion in the rotation angle, namely e ~ 0(1/ \/\q\ \k\). This means 
that the shape of the initial packet is preserved when the effective width in the 
pointer variable p is of order vj^IlM > which is considerably larger than the eigen- 
value spacing. Note, however, that in the limit where \q\\k\ — > oo, this large width 
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\(x)=q(x)/\k 




Figure 3.1: Sampling the local weak value X(x) over a narrow window. In this 
example, \q\\k\ = 25, O = n/2, x = tt/8 and e = 7r/32. The system is therefore 
rotated by angle tt/8 ± vr/32, and the sampled weak value is approximately \(x) ~ 
-23. 

nevertheless becomes insignificant relative to the overall shift in p, which should 
be of order |§||/c|. This shows that for boundary conditions that are approximately 
classical, it is possible to guarantee a statistically significant effect on the pointer 
variable that is a rigid shift proportional to the classical angular momentum. 

3.3 Superpositions of Weak Measurements 

On the basis of this local picture, it is then possible to develop an alternative inter- 
pretation of the relative wave function (j)f(p) away from the weak regime, or in other 
words, when the dispersion in the back-reaction angle is considerable. For this we 
note that given an arbitrary initial apparatus state \4>i), the wave function in x can 
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always be "chopped" into non-overlapping windowed wave functions (pi(x\n) 



Ux) = E \] p ( n \&) <t>M n ) ( 3 - 2 °) 

n 

where 

4>i{x\n) = I V p ( n l<M 

[ if \x- x n \ > e n 

P(n\<fH) = I dx\Ux)\ 2 (3.21) 

and where say if n and n + 1 are contiguous cells, then x ra +i — x n = e n+ i — e n . If 
the "chopping" in Eq. ()3,2fl|) is sufficiently fine so that within each window either a 
small angle condition is satisfied, or, more restrictively, a local linear expansion of 
the phase is valid, then the relative wave function in the p representation may be ap- 
proximated as a coherent superposition of overlapping (but nevertheless orthogonal) 
wave functions, each of which gets shifted by the appropriate local weak value. In 
particular, if the "group velocity" approximation is valid within each window, then 
it is the overall shape of the Fourier transform (j>i(p\n) of each windowed function 
4>i(x\n) which gets shifted, in which case the relative wave function expands as 



</> f (p) ~ E \]P(n\<t>i)<t>i{P ~ Kin) |n) e W-*(s)s]. (3.22) 

n 

Thus, one may think of a measurement given an arbitrary preparation of the ap- 
paratus as a coherent superposition of weak measurements, each sampling a weak 
value at a different sampling point x n . 



3.4 Illustration: Eigenvalue Quantization in a Strong 
Measurement 

For the boundary conditions in question, the sampling picture suggests that when 
the initial pointer wave function 4>i(p) is sufficiently narrow that the eigenvalues 
of L become distinguishable, one may equivalently view the resultant conditional 
probability distribution as an interference effect arising from sampling the classical 
angular momentum over a large range of x. 

We have tried to illustrate this interference effect in Figures ()3,2jl and (|3,3|) 
for the same boundary conditions of Fig. (|3.1|) . \q\\k\ = 25 and O = n/2, for which 
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the local weak value is 



A(s) 



25 cos(x) . 



(3.23) 



The initial wave function is in this case taken to be a minimum uncertainty packet, 
in x 



of spread a x = ir. Its Fourier transform is then a Gaussian in p with a spread 
dp = 1/27T ~ 0.16, which is much smaller than the eigenvalue separation. The 
sampling is performed at equal intervals of x n = nir/4, so e = tt/8 for each window. 
In the p representation, each of the windowed functions 4>i(p\n) is then approximately 
a "sine" function centered at p = 0, and modulated by a phase factor ~ e ~ l P x n^ as 
in Eq. (|3.7|) , Each of these gets shifted approximately by the local weak value 
A(a5 n ) = — \q\\k\ cos(x n ). To illustrate how 4>f{p) is built up from the interference 
of the shifted windowed functions (pf(p\n), Fig. (|3.2|) shows the real part of the 
latter, scaled by the appropraite weights y/P(n\<f)i). The net sum of the imaginary 
parts cancels out by symmetry. The cosine curve of A(x), also shown in Fig. (|3,1|) . is 
clearly appreciable from the array of these shifted functions. Note that the amplitude 
of this curve determines the region where the resultant probability distribution, 
shown in Fig. (|3.3|) . is appreciable. 

The emergence of a quantized structure in this distribution may then be 
understood from the periodicity of the weak value as follows: to a given window 
with x n G [— vr, ir), there correspond an infinite number of other windows at different 
winding numbers, i.e., x n ± 2ir, x n ±4-7r, where the same weak value is sampled. 
Each of these "secondary" samples yields approximately the same partial wave func- 
tion, except for an additional relative phase factor e^ tp2 ' K , e^ 4 "", weighted by 
a relatively slow-decaying weight factor P(n\4>i). The phases therefore interfere 
constructively when p is an integer and destructively when p is a half integer. Very 
roughly, then, one may understand the resultant interference pattern in <fif{p) as the 
product of two terms: First, a rapidly varying factor 



where A(p) is a sharply peaked function at p = 0. This accounts for the global 
periodic behavior of the local weak value. The second factor yields an envelope to 
the modulation factor which accounts for the average contribution of the samples 
within a given period, for instance the samples x n lying between — ir and ir. To a 
first approximation, the envelope may be obtained by replacing the Gaussian shape 




(3.24) 




(3.25) 
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of 4>i{x) for a flat distribution within the interval, in which case the resultant wave 
function is proportional to that of a window test function centered at x = and 
covering one complete revolution, i.e., e = tt: 

4>t(p\x = 0,e = tt) = — ^ da; e -iiw-i[9l[*[Bin(x) _ ( 326 ) 

This rough decomposition becomes increasingly accurate in the limit a x — * oo, where 
the product (f)i(x)e lS ^ becomes invariant under translations in x modulo 27r; the 
two factors (|3.25f) and (|3.26|) correspond then to a decomposition in terms of Bloch 
states, with the A(p) in ()3.25j) replaced by a true 6{p). 

A consequence of this decomposition is then that up to a normalization, 
the second factor must yield for integer values p = m the transition amplitude 
(q\U(m)\k) for an intermediate projection onto an eigenvalue m of L. For such 
values, the integral is easily obtained in closed form in terms of Bessel functions: 

= m|x = 0,e = tt) = (-l) m J m (\q\\k\) . (3.27) 

A continuous envelope for the probability distribution, indicated in Fig. ()3.3|) by 
the dotted line, is then (\q\ \k\) times an appropriate normalization constant. 



3.5 Error Laws 

Since the measurement is in this case clearly a strong measurement, the trace of 
weak values, when seen from the sampling picture, is practically washed out by the 
interference of the different samples. It is nevertheless instructive to note how two 
statistics of the resultant probability distribution can nevertheless still be connected 
to the picture of sampling weak values: 

First of all, let us note the trace that remains from the fact that the dominant 
sample is the one centered at x = where the weak value is — This can 

be seen clearly from the asymmetry in the wave function <f>f(p) in Fig. (|3.2|) . an 
asymmetry that is barely noticeable once the amplitudes are squared as seen in Fig 
(J3.3|) . Nevertheless, the asymmetry leads to a slight bias of the distribution towards 
the predominantly sampled weak value, a bias that is absent altogether in an ideal 
strong measurement. The bias or the mean displacement (p)f = (<pf\p\<f)f}, is easily 
calculated as was done earlier for the case of a narrow window. We note that a 
Gaussian wave function has all moments defined for both x and p and is analytic in 
either domains. Thus, the effect of multiplying 4>i(x) by a phase factor e %s<yX ^ may 
again also be described in the Heisenberg picture as a shift of the pointer variable 
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Figure 3.2: A relative wave function for a strong measurement built up as a super- 
position of weak measurements (see text). 
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Figure 3.3: The resultant probability distribution function (p 2 (p), and its envelope 

« Jj£|(M|fc|). 

operator 

p f =pi + S'(x) =pi + X(x). (3.28) 

Thus, for a Gaussian initial state \(fii) centered initially at zero in p, the average 
conditional shift is 

(Pf) = (m) (3-29) 

where (A(cc)) is the expectation value of the local weak value over the probability 
distribution in x , dP(x\(f)i) = dx\(pi(x)\ 2 : 

(A(x)) = f dP{x\4>i)X{x) . (3.30) 

J X 

It is truly then the average sampled weak value in the limit where the samples 
become infinitesimally narrow. For a normal distribution in x, centered at x = 
and with X(x) = —\q\\k\ sin(x + 9 ), the mean weak value is easily obtained: 

(A(x)> = A(0)e- CT ' /2 . (3.31) 

The mean conditional displacement is hence the weak value evaluated at the peak 
of the distribution and scaled by an exponential suppression factor e~ a '^ 2 . 

A more notorious trace of the sampling picture is, as mentioned earlier, the 
connection that exists between the amplitude of the weak value curve and the width 
of the resultant probability distribution. This connection can now be expressed in 
terms of an intuitive "error" law connecting the initial and final variances Ap 2 = 
(4>i\p 2 \4>i) , and Ap 2 = {(f) j\p 2 \(f> /) — (p) 2 , which follows from Eq. (|3.28j) . Taking the 
square of this equation we have 

p}=p 2 + {pi,\m + X(x) 2 . (3.32) 
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It is easily shown that for the Gaussian packet, the expectation value of the anti- 
commutator of p and X(x) vanishes. Thus, we have 

Ap) = Apf + AA 2 (3.33) 

where AA 2 is the variance (A(x) 2 ) — (A(x)) 2 of the local weak value over the proba- 
bility distribution dP(x\(j>i): 

AA 2 = f dP(x\ct>i) (A(x) - (A(x))) 2 . (3.34) 

Again this corresponds to the variance of the sampled weak values in the limit of 
infinitely narrow samples. In our case this variance is given by 




(3.35) 



which is essentially the r.m.s. value of the local weak value A(x) on its curve. 

Finally, to see that these quantities do in fact coincide in the limit of strong 
measurements with similar quantities obtained from the conditional spectral distri- 
bution, we recall that this spectral distribution is given by 

P(m\qk) oc J^(\q\\k\) . (3.36) 

In fact, we note that the proportionality constant is unity since the Bessel functions 
satisfy 

oo 

E 4^) = 1 (3-37) 

m=— oo 

for all z. The conditional average of m is then 

oo 

(m)= £ JlWm (3.38) 

m=— oo 

which vanishes by symmetry. This coincides with Eq. l)3.5j) in the limit a x —* oo. 
Similarly, we note the identity 

(m 2 }= J2 Jl(z)m 2 = -. (3.39) 

m=— oo 

Letting z = \q\\k\, we see that again this coincides with Eq. (|3.35|) in the same limit. 
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3.6 Summary: Local Weak Values 



In chapter 2 we saw how relative to a given post-selection, the unconditional statis- 
tics of the pointer variable break up as 

dP(p\9 f ) =Y.P{^^f)dP{p\^ l) ) (3.40) 
n 

where dP(p\(f)f) is the conditional probability distribution of the pointer variable 
obtained from the state of the apparatus relative to a final condition \ipu): 

\ct>f) = 1 <</v|e ii£ |Vi> • (3.41) 

It was furthermore argued that when the initial apparatus state satisfied appropri- 
ate weakness conditions, this conditional distribution could be interpreted approxi- 
mately in terms of a weak linear model (WLM) as 

dP{p\<pf ) ~ dP(p - a„\<f>i) , (3.42) 

where alpha M = Re-^fj^y . The picture that we then wished to associate with this 
model was that when the conditions of the measurement are such that the measured 
system and the apparatus behave almost as independent entities, the system imparts 
a well-defined "kick" to the apparatus proportional to a„. Our purpose then was to 
see how this picture could be extended to more general apparatus conditions that 
do not satisfy the appropriate requirements of weakness. 

This is the picture of sampling weak values for which we gave a preliminary 
illustration in this chapter. What we have illustrated here is that in the case where 
the transition amplitude in Eq. (|3.41|) is a pure phase factor, i.e, 

(^|e ii£ |^i) = (const) e iS{x) (3.43) 

where S(x) is a real function of x, it is possible to develop a simple picture of the 
relative state as a coherent superposition of weak measurements. The idea is then to 
think of the initial state of the apparatus as a superposition of non-overlapping "sam- 
ple states" , each of which has a wave function in the x representation of bounded 
support within ±e n around a given value x n . Each sample may then be considered 
as implementing a weak measurement if the variation of the phase gradient S'(x) is 
small within the interval x n ±e n , in which case the mean displacement of the pointer 
variable is essentially a local weak value of the measured observable A, evaluated in 
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a configuration where the initial and final state vectors are rotated relative to each 
other in Hilbert space by an intermediate unitary transformation e iAin : 



a^x) = S'(x n ) = Re 



(Vgie^-I^i) 
(Vv|e^ n |^i> 



(3.44) 



Finally, we saw how despite the fact that in the general, non-weak, case the shape 
of the pointer variable wave function is significantly altered because of interference 
between the different samples, it is nevertheless still possible to connect the picture of 
sampling weak values to two statistics of the conditional pointer variable distribution 
dP(p\(p^), namely the mean and the variance of p, according to: 



where (p)i and Apf are the initial mean and variance of the pointer variable and 
(A w (x)) and AA w (x)) 2 are the mean and variance of the local weak value evaluated 
over the probability distribution for x defined by the initial state of the apparatus, 
i.e. dP(x\(pi) = dx \cj)i(x)\ 2 . The latter correspond then to the mean and variance 
of sampled local weak values in the limit when the samples become infinitesimally 
narrow in x. 

The problem that concerns us now is how to interpret the picture of sam- 
pling weak values in the more general situation in which the amplitude function 
{i^^\e tAx \il)i) in (|3.41|) is not necessarily a pure phase factor. From the point of 
view of the weak measurement regime, this more general situation would entail an 
imaginary component of the local weak value, and the associated effects that were 
briefly mentioned in the last chapter. 

The contention here is however that a more convenient and intuitive descrip- 
tion is provided by trading in this imaginary component for another real function, 
which we shall simply call the "likelihood factor" L^(x), defined as 



The intuition stems from a correspondence, both formal and in particular cases 
quantitative, that can be established between the sampling picture based on a^x) 
and L^{x) and a classical probabilistic description of the measurement interaction 
with mixed boundary conditions on the system. 



<P>/ 
A P 2 f 



(p)i + (a^(x)} 
&P 2 i + Aa^ 



(3.45) 




(3.46) 
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Chapter 4 



Bayes' Theorem and 
Retrodiction in Classical 
Measurement 

The second insight into the model comes from drawing parallels with the classical 
description of the measurement. According to classical mechanics, it should in 
principle be possible to measure any quantity, with perfect precision, and without 
back-reaction on the system. This ideal situation demands however an ideal control 
of the initial conditions of the apparatus which may not be available. As it then turns 
out, the problem of retrodiction in the classical description is not entirely trivial once 
this control is lost. The problem has to do with the fact that as we move forward in 
time, the probabilities we assign to the classical state of a system, i.e., the point in 
phase space, may change in either of two ways: because of the mechanical evolution, 
or else because of acquisition of new information. When the system is controlled for 
both initial and final conditions, both "effects" are confounded in the probabilities 
and some care is needed to disentangle them. Fortunately, the classical description 
provides sufficiently clear formal criteria to distinguish what is "mechanics" and 
what is "information". Our aim will then be to establish a formal correspondence 
between these criteria and elements in the quantum description. 

4.1 Prior and Posterior Probabilities 

According to the Bayesian view of probability |2(J1 1211 122j , a probability statement 
about a possible situation X is always viewed relative to a particular set of stated 
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conditions Y. Hence the symbol 

P(X\Y) (4.1) 

to denote the probability of X when Y is known. A typical problem of inference 
occurs when one starts with knowledge only of Y, but later finds out additional 
information, for instance, that a certain other condition Z is indeed satisfied. If Z 
is a relevant condition, then one intuitively expects the probabilities to change. The 
problem is then to find how the a priori probabilities are modified to a posteriori 
probabilities in light of this additional condition. The solution to this problem is 
given by Bayes' theorem. 

To see how this comes about, we recall the product rule of probability, which 
states that 

P(XZ\Y) = P(X\YZ)P(Z\Y) . (4.2) 

where XZ stands for X "and" Z. The product rule can however be applied in the 
reverse order, in which case 

P(XZ\Y) = P{Z\XY)P{Z\Y) . (4.3) 

Equating (|i~2l and flOJ), we find that 

P( 7\ XV) 

P(X\YZ) = P(X\Y) A_I_J , (4.4) 

The last line is Bayes' theorem. It states that the posterior probability of X given 
Y and Z, is the prior probability of X given Y, multiplied by a factor 

commonly known as the Likelihood factor. The effect of this factor is to increase 
(decrease) the prior probability for those values of X for which Z is more (less) 
likely to occur given Y and X, as one may expect intuitively. 

We finally note to facts about the passage from prior to posterior distribu- 
tions: 

First, from the product and sum rules, it is easy to see that the probability 
P(Z\Y) in the denominator of the Likelihood factor is 

P(Z\Y) = J2 P(ZX\Y) = J2 P(Z\XY)P(X\Y) . (4.6) 

x x 

This tells us that P(Z\Y) can be determined from the normalization condition on 
the posterior probability as expected. 
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Second, while the posterior probability can be determined from knowledge 
of the conditional probability P(Z\XY) and the prior probability P(X\Y) for all 
values of X, it is generally not possible to determine the prior probability from 
knowledge of the posterior probability and P(Z\XY) alone. This can clearly be 
seen if we consider a situation in which P(Z\XY) is zero for a given range of X. 
In this case, any number of prior probability distributions are compatible with the 
same posterior probability. This tells us then that the transformation from prior to 
posterior probability with a fixed likelihood factor L(X) oc P(Z\XY) is generally 
an irreversible mapping in the space of probability distributions. 



4.2 Prior Phase-Space Distribution 

We shall now consider a simple classical caricature of the von Neumann scheme as 
we developed it in Chapter 2. Here we envision the apparatus as being described 
by classical canonical variables x,p, and the system described by a set of general- 
ized canonical coordinates rj. The measurement interaction is then described by a 
classical von-Neumann Hamiltonian 

H M = -5(t - U)xA(ri) . (4.7) 

Now, denoting the states immediately before and after the measurement by the 
suffixes i and /, we can see that the Hamiltonian has two constants of the motion, 
x and A(r]), and thus: 

x f = Xi A{jh) = A(rj f ) . (4.8) 

From this we know that the pointer variable indeed receives a kick proportional to 
the value of A at the time of the measurement 

p f = Pi + A(r H ), (4.9) 

On the other hand, the Hamiltonian also drives as a back-reaction other variables 
of the system which are not invariant under the phase-space flow induced by A(rj). 
Thus, the most we can say is that the final state of the system r/f is connected to 
the initial state through some map 

r jf = n(r H ,x) (4.10) 

solving the dynamical equation 

^jg^ = {A( V ) , KV, x) }pb , KV, x = 0)= V , (4.11) 
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where {A(rj) ,fj,(r),x) }pb is the Poisson bracket. 

So now suppose we start with some prior information Y = Y s Y a about the the 
initial state of the system and the apparatus, which is entails a factorable probability 
distribution 

dP(x,pi,TH\Y) = dP{x, Pl \Y a )dP( Vi \Y s ) , (4.12) 

where we drop the label for x as it is a constant of the motion. To obtain the 
prior distribution for the final point in phase space, we then use the solutions to the 
equations of motion, i.e. Eqs. (|4.9f) . (j4.10f) 

dP(x,pf,rj f \Y) = / 5(pf - pi - A(rn))5(rjf - /j,(rn,x))dP(x,pi\Y a )dP(rn\Y s ) . 

Jx,pi,r)i 

(4.13) 

This transformation may also be viewed in the more familiar passive sense, i.e., as 
a map in the space of phase-space distributions 

dP(x,p,r h \Y;i)^dP(x,p,i lf \Y;f), (4.14) 

in which case we view the point in phase-space as held fixed and the distribution 
function evolving from dP(x,p, rj\Y', i) = dP(x,p\Y a ;i)dP(rj\Y s ;i) to the new distri- 
bution dP(x,p,rj\Y; /) according to Liouville flow. Denoting the generator of phase 
flow associated with a given function / as 

£f = {f, }pb, (4.15) 

and for simplicity defining the generator of the phase flow induced by the measure- 
ment as 

£ s = A{ v )£ x + x£ A , (4.16) 
the final prior distribution is then 

dP(x,p,r]\Y; f) = e^ £s dP(x,p,7]\Y;i) 

= dP(x,p- A(r ] )\Y a ;i)e- x£A dP(7 1 \Y s ;i). (4.17) 

It is clear that this transformation is reversible as it may may be undone by a second 
application of the inverse Liouville flow operator e £s . 

4.3 Posterior Distribution 

Consider however what happens when we acquire, by some other means, new infor- 
mation Z s about the state of the system after this measurement interaction. Since 
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the probability of Z s now depends on the back-reaction on the system, we must 
then re-assess all our prior information, both about the initial and final points in 
phase-space. To do this let us use Bayes' theorem as described above to obtain the 
posterior distribution for the initial state: 

jt->I Wr *y \ iryt I \r\ ^i^s \% j Pi, ffi^) 

dP{x,Pi,r)i\YZ s ) = dP(x,pi,rn\Y) p ^ ^ 

= dP(x, Pl \Y a )dP( Vl \Y s ) P{Z ^ Z ^ Y) • (4-18) 

Now, as the condition Z s on the system was obtained independently of the apparatus 
and after the measurement interaction, it will depend only on the final phase-space 
point r/f of the system; hence, 

P(Z s \x, Pi ,r H Y)= [ dP( ?7/ | a ;,p i ,r ?i y)P(Z s |r ?/ ). (4.19) 
Jij f 

Moreover, as this final point is connected via the mapping fj,(rji,x) in Eq. H4.1U|) 
only to rji and x, we have 

dP(rj f \x,pi,7]iY) = dr]fS(r]f - ^{r]i,x)) . (4.20) 

This allows us to eliminate the conditions that are irrelevant for Z s given x and rji 
in the likelihood ratio 



P(Z s \x,pi,r]iY) P(Z s \xr\i 



(4.21) 



P(Z S \Y) P(Z S \Y) 
Thus the posterior distribution for the initial state reads 

dP(x,pi,r H \YZ s ) = dP(x,p i \Y a )dP(r H \Y s )L Zs (xr H ), (4.22) 

where the likelihood factor is 

Lz a M = P p Z { ^\Y) * / d7]f6{Vf ~ ^,x))P(Zs\v.f) ■ (4-23) 

What we see therefore is that by including final conditions on the system, conditions 
which are dynamically connected to the initial conditions r]i and the reaction variable 
x, the degrees of freedom of the system and the apparatus are correlated in a non- 
trivial fashion in the posterior initial distribution. 

The correlation is then propagated forward in time to the final posterior 
distribution. As in the case of the prior distribution, the map 

dP(x,p, V \YZ s ;i) ^ dP(x,p, V \YZ s ;f) (4.24) 
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from posterior initial distribution to posterior final distribution is a reversible map 
expressible in terms of the flow operator e £s . What is important to note, however, 
is that the map from the prior initial distribution to the posterior final distribution, 

dP(x,p, V \Y s ;i) - dP(x,p, V \YZ s ;f) (4.25) 

is not a reversible map. The map is given by 

dP(x,p,r]\YZ s ; f) = e £s dP(x,p,7]\Y;i)L(xrj) 

= dP{x,p- A{r])\Y a ;i)e x£A dP(r]\Y s ;i)L(xri) (4.26) 

and hence involves two types of transformations 

a. an irreversible part corresponding to probability re- assessment, which is given 
by multiplication by the likelihood factor. 

b. a reversible part describing the phase flow, which is given by the action of the 
operator e £s . 



4.4 Sampling 

We shall find it convenient to re-express (|4.18l ) in a manner in which the the logical 
dependence between the variables becomes more explicit. For this we take the 
likelihood ratio and re-write it as: 

P{Z s \x m ) = P{Z s \x m ) P{Z s \xY s ) 

P(Z S \Y) P(Z s \xY s ) P(Z S \Y) • 1 • > 

As one can then see from Bayes' Theorem, the posterior probability of rj, given x, 
is given by 

dP( m \Y s x) = d P( Vi \Y s ) ^^ (4.28) 

assuming, as we have done, that x is irrelevant to rji a-priori. Similarly, the posterior 
apparatus phase space distribution is 

P( 7 \tY \ 

dP(x, Pl \YZ s ) = dP(x, Pi \Y a ) (4-29) 

This allows us to write the posterior initial distribution for the system and apparatus, 
now in the passive sense, as 

dP(x,p V \YZ s ;i) = dP{x,p\Y a ;i)dP( V \xY s Z s ;i)L Zs (x) , (4.30) 
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where the likelihood factor is now only a function of the reaction variable x\ passively, 
the likelihood factor is given by 



where P(Z s \r/;f) is the probability P{Z s \rf) at the time referred to by Z s , and 
propagated using the system's free evolution back to the time immediately after the 
measurement. Finally, to obtain the final distribution from Eq. (|4.3U|) we apply the 
phase flow £$ as before. 

To get some intuition for Eq. (|4.3U|) , think of the initial and final conditions 
on the system as two distinct regions Ri and Rf in its phase space as illustrated in 
Fig. (|4.1|) . Given a specific value of x, the initial region is then deformed by the phase 
flow generated by A{rf) (i.e., the mapping fj,(j]i,x)) to some other region Ri{x) = 
e x£A Ri. The conjunction of the initial and final conditions given x corresponds 
then to the region Rif(x) = Ri(x) C\Rf where the two regions overlap. This region 
determines the posterior final distribution dP(r/\xY s Z s ; f). The posterior initial 
distribution is then obtained by "undoing" the flow on the intersecting regions. 
Since the parameter x is uncertain, the specific intersecting region that is sampled 
according to x becomes uncertain as well. The probability that a given region 
Rij(x) is sampled is then given by the posterior probability in x, which is the prior 
probability times the likelihood factor Lz a {x). Within this picture, the likelihood 
factor is easily understood: it is proportional to the relative volume of the sampled 
intersecting region Rif(x). 

4.5 Conditional Distribution of The Data 

We now wish to see what the likelihood factor entails classically in terms of the 
analysis of the data. Concentrating on the relevant variables, we introduce the 
posterior distribution of A given x, 



i.e., the probability of A within one of the intersecting regions Rif{x). In terms of 
this distribution, we then have for the initial posterior distributions of the pointer 
variable 




(4.31) 




(4.32) 




)LzM 



(4.33) 
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< 1 ► x 

Figure 4.1: Phase space illustration of the posterior distribution (|4.3UI The initial 
and final conditions correspond to the lightly shaded regions Ri and Rf. The initial 
region is then deformed by the transverse flow generated by A and parameterized by 
the reaction variable x. A posterior distribution for 77 given a value of x corresponds 
to one the darkly shaded overlap regions. The likelihood factor for a given x is 
proportional to the volume of the corresponding region. 
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These we may now compare to the corresponding prior distributions, that is, without 
the classical post-selection on the system: 

dP(p\Y;i) = f dP(x,p\Y a ;i) 

Jx 

dP(p\Y;f) = f dP(A\Y s ;i)dP(p-A\Y a ;i). (4.34) 
J A 

What we see here is an interesting situation that is somewhat evocative of the 
discussion in Chapter 2: 

We observe that in the prior case, the final distribution of the data takes the 
simple separable form of a convolution with the prior distribution for the "signal" 
A. On the other hand, the same form is not attained in the posterior case; instead, 
separability is attained only in the mixed form, i.e., as in Eq. (|2.15|) . with the role of 
the "mixing parameter" x i n that equation now being played by the reaction variable 
x of the apparatus. In other words, what Eq. Q4.33J1 shows is that, in contrast to 
the case of a pre-selected sample, prior information about the reaction variable is 
relevant in the proper analysis of the data from a post-selected classical sample. 
Recalling then the problem of separability discussed in Chapter 2 in regards to the 
post-selected data, it is interesting to note therefore that this variable is precisely the 
one that in the quantum mechanical description cannot be controlled independently 
of the pointer variable. 

Before pursuing this connection with the quantum case any further, we would 
now like to note two interesting consequences brought about in the classical case by 
the fact that prior information about the reaction variable x becomes relevant in the 
posterior analysis. As we shall later see, both of these consequences have interesting 
parallels in the quantum case. 

For this, it is sufficient to look at the expectation value of the final pointer 
reading pf. 

(Pf) = (Pi) + (A) . (4.35) 
Given initial conditions only, the two averages on the right hand side are obtained 
from the prior distributions 

( Pi ) = ( dP(p\Y a ;i)p 
Jp 

(A) = [ dP(A\Y a ;i)A( V ) . (4.36) 
J A 

On the other hand, given initial and final conditions, using Eq. (|4.33|) . the posterior 
averages are 

(pi) = / dP(xp\Y a ;i)L Zs (x)p 

J X 
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(A) = J dP{x\Y a )L z ,{x) J dP(A\xYZ s ;i)A. 



(4.37) 



4.5.1 Bias In The Data 

The first consequence has to do with bias in the readings. Suppose that given the 
prior information only, the initial expectation value of pi vanishes. To gauge the 
systematic average kick that the system imparts on the apparatus, when the system 
belongs to the Y s sample, we would then establish the value p = as our reference 
origin. For our inference of (A) we would then take the mean value of our readings 
of Pf. 

However, we can see that the posterior expectation value of pi need not 
vanish if the prior distribution dP(x,p\Y a ; i) cannot be separated into a product of 
its marginals dP(x\Y a ;i) and dP(p\Y a ;i), in other words, if x and pi are correlated 
a-priori. We must be careful therefore to account for a possible shift in the location 
of the reference origin; otherwise, our assessment of the average kick from the Y S Z S 
sample will be biased. From the practical standpoint, we can see that the problem 
of bias may be dealt with by ensuring initial conditions on the apparatus such that 

dP(xp\Y a ; i) = dP{x\Y a - i)dP(p\Y a ;i) . (4.38) 

This guarantees that the expectation value of pi and its variance remain the same 
in the posterior distributions, and avoids any correlations between pi and A. 

4.5.2 Posterior Dependence on the Reaction Variable x of the 
Sampling Region 

The second, more relevant consequence is that the sampling distribution for A now 
becomes dependent on the initial conditions of the apparatus through the likelihood- 
weighted reaction variable x. To see this, let us view the posterior average of A as 
a double average 

(A) = I dP(x\Y a )L Zs (x)(A)(x) (4.39) 

Jx 

where {A)(x) is the posterior average of A given x, i.e., in the rough picture above 
the average of A within the intersecting region Rif{x). 

Consider then a situation, similar to a weak measurement, where the prior 
distribution in x is very sharp around x = 0, so that a priori we expect the reaction 
on the system to be small. Furthermore, suppose that the final condition itself is very 
unlikely given no reaction, so that the overlap region between Ri and Rf is small. 
Control for the final condition would then seem to be a way of isolating statistically 
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a small and unusual volume Rif(x ~ 0) in phase space where the dispersion in A(rj) 
may be small and the average of A could forseeably correspond to a rare outlier in 
the prior distribution of the data. 

However, what determines the sampled region is not the prior distribution in 
x, but rather its posterior distribution. If it happens then that as x is varied away 
from zero the deformation Ri(x) of the initial region increases the overlap volume, 
then the Likelihood factor will increase away from zero also. This could then have 
various effects on the posterior distribution for x depending on the shape of the 
prior, which more or less fall into four categories (see Fig. 14.2(1 : 

If the prior distribution is sufficiently sharp but has tails, and its location 
falls in a region where the likelihood factor increases in a certain direction, the main 
effect of the latter will be to "shift" the center the distribution distribution in the 
direction of increasing likelihood. The sampled region corresponds then to some 
other region than the one expected a priori. 

Also, if the location of the prior falls close to a local maximum or minimum 
in the likelihood factor, the posterior distributions exhibits then a "squeeze" or 
"stretch" effect. In the first diminished likelihood at the prior region of 

expectation increases the odds at the tails. The second case corresponds to higher 
"confidence" in the prior region of expectation, and hence diminished tails. 

Finally, it may also happen in close to a point of minimum likelihood that 
if the prior is not sufficiently sharp, the effect of may be to produce "dents" cor- 
responding to two or more predominantly sampled regions. Again one may expect 
the sampled volume in phase-space to increase and, most likely, an increase in the 
dispersion of A. 

Thus, if the final condition is used as a means of further delimiting the 
sample in phase-space, care must be taken to ensure that the prior distribution is 
sufficiently bounded so that the sampled region is indeed the region of interest. As 
we shall see later, these likelihood-induced effects on the posterior distribution have 
counterparts in the quantum mechanical case with other interesting consequences. 

4.6 A Classical System with Dirichlet Boundary Con- 
ditions 

In the classical description of the measurement, prior knowledge of the reaction 
variable x becomes entirely relevant when the statistics are analyzed against initial 
and final conditions imposed on the system. As we have seen so far in this chapter, 
within a specific, classically post-selected ensemble, the reaction variable parame- 
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Figure 4.2: Four possible effects of the likelihood factor on a prior distribution 
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terizes the effective region in phase-space from which the measured variable A is 
sampled; in other words, the sampled average of A acquires an implicit x depen- 
dence. This dependence is reminiscent of the parametric dependence on x of the 
local weak value a(x), which we introduced in the previous chapter. 

However, the classical analysis also shows that when it comes to the probabil- 
ities for the possible values of x over which these x-dependent regions are sampled, 
these are not the probabilities assigned on the basis of our initial preparation of 
the apparatus. The prior probabilities are re- assessed by a likelihood factor which 
depends on the posterior sampling volume corresponding to a given value of x. This 
analogy with classical probability re-assessment is still missing within our picture 
of sampling weak values. Our intention in this section is to pursue the classical 
analysis one step further in order to establish, both formally and quantitatively, a 
direct correspondence between the classical and quantum mechanical descriptions of 
the measurement. In this way we hope to motivate the interpretation of our model. 

Now, as we mentioned in the introduction, the idea behind the two-vector 
formulation is that the full description of the system is contained in the pair of 
initial and final wave functions. Otherwise, either one of the wave functions tells 
us only "half of the story" . To pursue the analogy, we shall therefore specialize the 
analysis of the previous section to a particular set of classical boundary conditions 
on the system, which can be realized within the quantum mechanical description, 
and which classically exhibit the property of telling the full story only by their 
conjunction. 

As is well known, in the Hamiltonian Formulation, the trajectory of a system 
is completely determined by specifying the initial configuration variables {q{\ and a 
set of conjugate momenta which can be inverted to yield the initial velocities {4i}- 
On the other hand, there is also the Lagrangian Formulation, where the trajectory 
of the system is determined from Dirichlet boundary conditions, in other words, by 
specifying the values of the {q} at two moments in time. 

Now, given these conditions, the response of the apparatus is completely 
determined if in addition one knows the precise value of the reaction variable x. On 
the other hand, if x is uncertain, then further information on the trajectory is needed 
to precisely determine the response on the apparatus. The available control (or lack 
of it therefore) on this additional information is what determines the likelihood factor 
in the posterior probability for x. As we shall now see, the conditional response 
of the apparatus, as well as the likelihood factor, can both be derived from the 
extremal action function. This will allow us to make a connection with the quantum 
mechanical situation. 
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Consider therefore a system described by a single configuration variable q, 
with a free Lagrangian £ (q,q). We assume that all our information about the 
system consists of its configuration q±, at some initial time t±, and the configuration 
q2 at some final time £2 For simplicity, we shall specialize to the case in which the 
measured observable is a function of the configuration variable only, i.e. 

A{n) = A(q) , (4.40) 

measuered at the intermediate time tj. This choice affords a considerable simplifica- 
tion as the total Lagrangian is simply the free Lagrangian minus the measurement 
Hamiltonian: 

£{q, q, t) = £ {q, q) + x5(t - U)A(q) . (4.41) 

As before, we disregard the free dynamics of the apparatus. 

Now, given the two boundary conditions, the trajectory of the system is 
completely determined once x is known. The trajectory is the one for which the 
action functional 

S[q(t),x] = dt £ (q, q) + x5{t - U)A(q) (4.42) 

is stationary, and thus corresponds to the solution of the Euler- Lagrange equations 
SS[q(t)} _ d dC Q dC D 



5q{t) dt dq dq 



- x5(t - U)A'(q) = (4.43) 



subject to the boundary conditions q(t\) = q\ and qfa) = q2- These equations 
describe the motion of the system under its free evolution, except at the time t = ti, 
where it receives a "kick" , proportional to the gradient of A, the intensity and sign 
of which is given by the reaction variable x. 

In all fairness, one should note that there may be more than one solution. 
In fact, one may generally expect this to be the case if A{q) is a non-linear function 
in q and either x and/or the time t2 — 1\ are not sufficiently small. We shall assume 
this is not the case, although the extension is interesting in its own right and can 
be handled without major difficulty. 

Supposing then a unique trajectory, the configuration variable of the system 
as well as every other dynamical variable is completely specified by the initial and 
final conditions q\ and c/2 and the value of the reaction variable x. Let us write this 
solution as 

qn(t;x). (4.44) 
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Note that even if the free evolution is simple, the parametric dependence of the 
Lagrangian on x and A(q) may generally make this trajectory quite complicated 
both as a function of x and t. 

Now, since we know that the pointer variable of the apparatus responds 
to the function A(q), the response of the pointer variable now becomes an implicit, 
generally non-linear function of x. Let us write this function, somewhat suggestively, 
as 

a 12 (x) = A(q 12 (U;x)). (4.45) 

Now define the action functional evaluated at the extremal trajectory as the "ex- 
tremal action function" 

S 12 (x) = S[q 12 (t;x),x]. (4.46) 
As it is then easy to show, a± 2 (x) can be obtained from a variation of this function: 

ai 2 (x) = . (4.47) 

For this we note that 

±S[q(tx\ qi q 2 ),x] = [ dt \ 6S[ f )] dqi f> X) + 5{t - U)A{q 12 {t; x))] . (4.48) 
ax J |_ oq ax 

The first term in the brackets is the implicit variation with respect to the trajectory, 
which vanishes by the Euler-Lagrange equations; thus, what remains is the explicit 
variation of the action in the second term. 

Hence, we observe that once the endpoints of the trajectory are perfectly 
determined, the classical conditional response of the pointer variable is essentially 
as if the apparatus had been subject to an effective, impulsive potential 

V eff = S(t-t i )S 12 (x), (4.49) 

which then determines a generally non-linear impulse of the pointer variable 

Pf =Pi + S' 12 {x) =pi + a 12 (x). (4.50) 

The extent to which this kick is precisely defined now depends exclusively on the 
extent to which the reaction variable x is controlled. 

Suppose therefore that there is some a-priori uncertainty in i. To gauge 
the mechanical effect of the system on the apparatus we must then consider the 
posterior distribution for the initial state of the apparatus, given the cited boundary 
conditions on the system. Using Bayes' Theorem we have 

dP(x Pi \YZ s q 2qi ) = dP(x Pi \Y a ) ^ q2 }* qi \ , (4.51) 

dP{q 2 \Y a qi) 



66 



where we assume no prior dependence of the apparatus on the initial conditions of 
the system. Next, we compute the Likelihood factor 

P(<l2\xqi) 



Ll2{X) 



(4.52) 



P(.<b\Y a qi) ' 

This is a more delicate computation as the final condition q2 is obviously not de- 
termined by the initial condition q\ and x alone, but rather by the map qi = 
q(t2,x,q\,ki) where k± is a momentum conjugate to q\. In accordance with what 
was stated earlier, once x becomes uncertain, prior information about the momen- 
tum becomes relevant. To find the probability P(q 2 \xq\) we must then assign a-priori 
probabilities to k\ . Clearly, knowledge of x and q\ entails no knowledge of the initial 
momentum k\. Hence, we appeal to the Gibbs postulate of equal a priori probabil- 
ities in phase space consistent with our known information. This assumption poses 
a slight problem as the momentum is not bounded, but, since we are only interested 
in likelihood ratios, we may use a limiting sequence of bounded flat distributions, all 
of which lead to normalizable distributions. When the bounds are taken to infinity, 



dP(k\xqi)P(q2\xqik) 



L 12 (x) oc I dki,5{q 2 - q{x,t;qi,k{j) . 



(4.53) 

Now, with a single extremal solution the integral picks up the value of the momentum 
k\ at t\ determined by the extremal solution. What is left after integration is then 
the Jacobian 



Li2{X) OC 



dq 2 



(4.54) 



Now we use the well-known fact that the extremal action is the generating function 
of canonical transformations in time |23| . This means that the initial and final 
momenta defined by the Lagrangian C can be obtained from the variation with 
respect to the initial and final coordinates 

dS 12 , dS 12 



ko 



dq 2 dqi 
Variation of k\ with respect to c/2 then gives us, for the likelihood factor 

d 2 S 12 (x) 



L u (x) oc 



dq 1 dq 2 



\d!d 2 S 12 (x) 



(4.55) 



(4.56) 



This quantity is well known in Hamilton- Jacobi mechanics [23], it is the so-called 
Van Vleck determinant or the "density of paths" [23] . 
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Hence, we observe that in terms of the likelihood factor and the effective 
impulsive potential, the passive map from the prior initial to posterior final phase- 
space distributions of the apparatus becomes 



dP(x,p\YZ s ;f)=e 



£ s 12 (*) dP(x,p\Y a ;i 



) 



\d!d 2 S 12 (x)\ 



(4.57) 



f x dP(x\Y a ;i) Id^Snix)] 



and therefore the final distribution of the pointer variable is given by 



dP{p\YZ s ;f) = / dP(x,p-a 12 {x)\Y a ;i) 



\d!d 2 S 12 (x)\ 



(4.58) 



f x dP(x\Y a ;i) |did 2 Si 2 (x)| 



We note finally that in this classical example, it is clear what constitutes a mechani- 
cal effect of the system on the apparatus — the reversible phase-flow induced by the 
effective potential, and what constitutes a re-assessment of prior probabilities-the 
irreversible multiplication by the likelihood factor |<9i<92<Si2(£)| • This formal distinc- 
tion will serve as the guiding principle in the interpretation of the non-linear model, 
to which we now turn. 
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Chapter 5 

The Non Linear Bayesian Model 



5.1 Semi- Classical Correspondence 

As mentioned earlier, there exist in quantum mechanics initial and final boundary 
conditions which correspond to conditions that completely determine the microstate 
of the system in the classical description. One should therefore expect that in a pre- 
and post-selection with such conditions, and given additional semi-classical condi- 
tions where quantum inertial effects may also be neglected on the side of the system, 
the conditional response of the apparatus should exhibit a correspondence with the 
classical description of the measurement. We now investigate this correspondence. 

Let us then analyze the conditional response of the apparatus, now quantum 
mechanically, when the system is pre- and post selected on eigenstates of the config- 
uration variable and the measured observable is some function A = A{q) measured 
at the intermediate time. In this case, using Eq. q2.18|) for the relative state of the 
apparatus we have: 

\4>f) oc (q2,t 2 ;t i \e iM \q 1 ,t 1 ;t i }\(l) i ), (5.1) 
where the transition amplitude is 

(q2,t 2 ;t i \e iA % l ,t 1 ;t i ) ee { q2 \e~ ifl °^-^ j** e -iH (U-ti) \ qi) (5 . 2) 

and H Q is the free evolution Hamiltonian of the system. We recognize the transition 
amplitude as the propagator for the Schrddinger equation for the system with the 
Hamiltonian H Q — S(t — ti)Ax. 

Now, it is well known that in the semi-classical regime, i.e., for small times 
t 2 — ti, and/or to leading order in powers of h/M, the solution to the propagator is 
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given by the WKB approximation |24j : 



(q2,t 2 ;t i \e iAx \q 1 ,t 1 ;t i ) WKB ~ p™)" 1 / 2 e ««0») ^|^^5i 2 (x)| , (5.3) 

where Si 2 (x) is the extremal action evaluated at the classical path, and the factor 
d\d 2 Si 2 (x) the classical density of paths, the same terms encountered at the end of 
the previous chapter. Hence, the state of the apparatus relative to this transition is 
in the WKB approximation 



1012) K e *s 12 ( £ ) ^\ dld2 S l2 (x)m • (5.4) 

Where the phase factor and the square root of the density of paths are now regarded 
as operator-valued functions of x. Let us then establish some parallels with the 
corresponding classical description, i.e., Eq. (|4.57j) : 

The "Kick" : Consider first the transformation e iS ^' defined by the phase- 
factor. Viewing it as an operator-valued function of x, we note that it is a unitary 
transformation; hence, as in the classical case, it corresponds to a reversible trans- 
formation. Moreover, its effect on the pointer variable operator p as seen in the 
Heisenberg picture, is 

e -iS(x) p e iS(x) = p + ai2 (~) ^ ( 55 ) 

which is essentially the "quantized" version of the classical conditional response of 
the pointer variable, i.e. Eq. (|4.5()jl . Now, as we recall, a\ 2 {x) in the classical 
case is the function A{q) evaluated at the trajectory that solves the Euler-Lagrange 
equations of the system with the measurement back-reaction term. On the other 
hand, we recall from an earlier discussion that the phase gradient is the local weak 
value of A 

, \ t? (Q2,t 2 ;t i \Ae iAx \qi,t 1 ;t i ) 

"12 \x) = Re — , (5.6) 

{q 2 ,t 2 ;ti\ e lAx \qi,tf,ti) 

Hence we see that in terms of the equations of motion entailed by the action S\ 2 (x), 

the picture we have so far suggested, namely, that of the pointer responding to 

the weak value of A, has a direct classical correspondence. The picture becomes, 

under semi-classical conditions on the system, the same classical picture in which 

the pointer variable responds to the value of A(q) on the classical trajectory with 

measurement back-reaction given a definite value of x. 

"Sampling": Consider then the hermitian operator y/did 2 S\ 2 (x) in Eq. 

(|5.4I) . Computing the normalization of the relative state, we see that the probability 

distribution for x, dx\(pf(x)\ 2 is given by: 

dP(x\^f) = dPtok) . J^ffi' , • (5-7) 
1 J x dP(x\pi) \did 2 Si 2 {x)\ 
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where dP(x\<pi) = dx\4>i(x)\ 2 . This expression has exactly the same form as the 
marginal posterior probability distribution for x obtained in the classical Bayesian 
analysis, i.e., 

dP(x\YZ s ; f) = dP(x\Y a ;i) ^ r ^^ML . ( 5 . 8) 

J x dP(x\Y a ;i) |<9i<9 2 S"i2(»| 

In other words, we see that the passage from the initial to the final relative distri- 
bution in x in the quantum case, exactly parallels the classical passage from prior 
to posterior distribution. 

Observe therefore that if we define in the quantum case a corresponding 
likelihood factor as 

{(pi\ \did2Si2{x)\ \<t>i) 

then the final conditional expectation value of the kick on the pointer variable, i.e., 
{4>f\a l2 {x)\4>f)= j dP[x\ ( t> i )L 12 {x)a X2 {x), (5.10) 

Jx 

can be made to coincide with the posterior expectation value of oc\ 2 {x) in the classical 
description. As we recall from the last chapter, in the pre- and post selected classical 
measurement it is the posterior distribution in x which determines the sampling 
distribution for A. Thus, by identifying the dP(x\4>i)Li 2 (x) as a posterior probability 
distribution in x, the picture of sampling the weak value a(x) also has a direct 
correspondence with the classical picture of sampling; it corresponds to the sampling 
of A(q) from regions in the system's phase-space parameterized by the reaction 
variable x. 

"Bias": Finally, we recall that in the classical case, the systematic effect of 
the system on the pointer variable is gauged from the posterior average of its initial 



expectation value. Let us then re-express \4>\ 2 } as 



\<t>Y) = e iS ^^L 12 {x)\<t>i)i (5-11) 

where we now view the square root of the likelihood factor ()5.9|) as an operator. If 
we then look at the expectation value of p for the relative state, we find from 1)5. 5 jl 
that 

(Pf) = {<t>i\\J L 12 {&) p \JLx2 (x) I <f>i) + ((pi\L 12 (x)a(x)\(pi). (5.12) 

The first term on the right hand side is what corresponds in the classical case 
to the posterior expectation value of p{. Here the parallel is slightly less direct, 
as x and p do not commute. Nevertheless, this expectation value still reproduces 
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qualitatively the classical property that if x and p are not independent a-priori, then 
the posterior expectation value of pi need not coincide with its prior expectation 
value. The problem of bias is dealt with classically by factoring the phase-space 
distributions, thus preventing prior correlations between x and p. In the quantum 
case, it is clear that x and p cannot be separated in this way, as the two variables 
are intrinsically constrained by the uncertainty principle. Nevertheless, it is still 
possible to establish a general condition to eliminate the bias in the expectation 
value of p, which resembles the classical no-correlation condition. Notice that if the 
initial wave function cpi(x) in the x representation can be written as 

(f>i(x) = R(x)e ip * x , (5.13) 

where R(x) is a real function of x, then since \/L\2 (x) is real as well, 

(<f>i\<J L 12 (x) p sj ' L 12 (x)\(j)i) = {4>i\p\4>i) =Pi- (5.14) 

Since the particular choice of pi entails no loss of generality (it only amounts to a 
redefinition of the momentum reference origin) , we shall refer to states for which the 
wave function in x is of the form (|5.13j) as "real states". One can then show that 
when \<pi) is a real state, the second order "correlation" function between p and any 
function f(x) which falls of faster than 1/R(x) 2 at infinity, 

(A P A/) = (<&| ^ {p, /(£)} - K (0; I = o. (5.15) 

Thus we see that by imposing a fairly general "no-correlation" condition on the 
initial state of the apparatus, the systematic effect on the pointer variable can be 
gauged, both classically and quantum-mechanically, from the same fixed reference 
origin pi. Let us also note in passing an additional desirable feature of real states 
with regards to bias; if 4>(x) is real, the the real and imaginary parts of its Fourier 
transform (pi(p) have definite parities with respect to reflections about the reference 
origin p^; since, therefore, |^!>j(p)| 2 is symmetric about this origin, any asymmetry in 
the distribution of the data can be attributed exclusively to the effect of the unitary 
transformation e tSl2 ^\ 

Now, taking note of the three above parallels, we then draw the following 
conclusion: 

• if the system satisfies the semi-classical conditions, in other words, both in 
terms of initial and final boundary conditions as well as inertial conditions, 

• then the conditional expectation value of (ai2(x)} can be interpreted, both in 
the classical and quantum descriptions, as the same systematic effect on the 
average momentum of the apparatus, 
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• provided that in the quantum case the "reference" initial quantum state or 
the apparatus is taken to be the state V / Li2(x) \4>i) , as opposed to the state 
\<j)i) that was initially assigned without prior knowledge of the "destiny" of the 
system. 

For the purpose of "calibrating" the model, we shall then make the reasonable 
assumption that in the classical limit U/M — > on the system, the conditional 
expectation value corresponds in both descriptions to the same mechanical effect. 
This leads us then to the formulation of the model under more general non-classical 
conditions on the system. 

5.2 The Model 

For any given transition \ip\) — > \ifj^) of the system, we can write the amplitude 
function as a real function times a phase 

(Wle^i) = y/P(^\x^)e iS ^ . (5.16) 

where P^^xipi) is the transition probability from \ipx) to |^), given a definite 
unitary transformation of the system e lAx ; the square root in Eq. (|5.16|) is allowed 
to take either sign to ensure continuity in the decomposition. Now, given an initial 
preparation of the apparatus \4>i), the relative final state for the apparatus can then 
be written as 

where we now interpret the normalization factor as the transition probability given 
this preparation. As we can see, this transition probability satisfies the product rule 
of probability in the x-representation 

P(Wl^i) = (<Pi\P^^i)\<Pi) = I dP(x\cf> t ) P(^|^i) , (5.18) 

J X 

and may thus be interpreted, quite intuitively, as the average transition probabil- 
ity when the intermediate transformation is sampled with the initial distribution 
dP(x\<j>i). 

Drawing then from the parallel established in the previous section, we now 
interpret S(x) as an effective action and infer an underlying action-reaction picture 
from the unitary operator e tS ^ x ^: for a given transformation on the system system, 
generated by A, and parameterized by x, there is a corresponding back-reaction 
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on the variable p conjugate to the transformation parameter. The reaction on the 
apparatus is an impulse proportional to the variation of the action 



Sp = Sfa) = a M (x) . (5.19) 

The impulse is then given by 

the weak value of the generator A with parameter value x. 

Similarly, we infer a probabilistic picture by noting that the irreversible trans- 
formation 

corresponds, in the x-representation, to the square root of a probability re-assessment 
with the likelihood factor 



We interpret this factor then as a generalized weight factor which in the semi- 
classical approximation corresponds to the "density of paths" in phase space. We 
define therefore a re-assessed initial state of the apparatus according to this "likeli- 
hood transformation" 

\4>M) = y/l^pjlk) , (5.23) 

denoted then as the initial state of the apparatus relative to the transition \ipi) —* 
IV^t); or the relative initial state for short. 

This state will then serve as the "reference frame" in order to gauge the 
mechanical effect of the system on the apparatus, in accordance with the results of 
the previous section. Thus, the relative final state is given by 

\<j ) f)=e iS ^\(t)f ) ). (5.24) 

The distribution of the data may then be analyzed, as in Chapter 2 within the 
picture of sampling weak values, i.e., as a superposition of local weak measurements 
where that instead of \(fii), we now use the relative initial state \4>i). The results of 
that chapter regarding the means and variances of the pointer variable then follow 
in analogous fashion. We shall concentrate on these in further detail in Sec. (|5.6J) . 
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Let us for now give a closed-form expression for the conditional probability 
distribution of the data, following from Eq. 1)5.24(1 : 

dP(p\<f>>}) = dp e~ lS ^ 5{p-p) e iS »W |^)) 

= dp S(p-p- a^x) ) |^) . (5.25) 

The expression corresponds then to the "quantized" version of a classical probability 
model for the data, in which the pointer receives a definite kick a^{x) given x, and 
the possible kicks are sampled over the posterior phase-space distribution for the 
apparatus. We emphasize that the correspondence becomes exact at the level of 
the expectation value, where assuming a real (i.e., unbiased) initial state of the 
apparatus with pi = 0, 

(p f )= I dP{x\4> ( f ] )a^x) = f dP(z#i)L M (x)a M (aO; (5.26) 

Jx Jx 

the distribution for x may equivalently be interpreted classically as the the posterior 
distribution with a likelihood factor L^x) oc P(ip^\xifji). 

In this way, we fulfill the goal we initially set out for, namely to find an 
intuitive expression for the distribution of the data, under general conditions of the 
apparatus, in which the picture of sampling weak values is always at the forefront. 

5.3 Interpretation in Terms of the Two- Vector Formu- 
lation 

Let us briefly discuss some aspects of interpretation surrounding the three elements 
of our model. 

5.3.1 The Sampled States 

According to the two vector formulation, at any given moment in time an initial and 
final vector are needed describe the system. What corresponds then to the "state" , 
i.e., analogous to the point in phase-space, is a pair of vectors in Hilbert space. In 
the model, the idea is therefore that by varying the parameter x we move from one 
pair to another. It is important then to note what these pairs are. 

Let us recall that in the classical situation considered at the end of the last 
chapter, when one fixes the end-points of the configuration variable trajectory, the 
whole trajectory becomes dependent on x. In particular, the phase-space points 
(qi,ki) and (qf,kf) are x -dependent. These points are then connected by the 
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canonical transformation generated by the measured function A{q) with parameter 
x, i.e., 

q f (x)= qi {x), p f {x) = Pi {x)+xA'{q). (5.27) 

Here we have a similar situation. Denoting by ui(x) a pair of vectors at a 
given time, the description is given by the point 

Ui(x) = (l^) , e- iAx \^)) (5.28) 

immediately before the measurement interaction, and by the point 

^(*) = (>hM,lVv>) (5-29) 

immediately after. The mechanical transformation of the system via back-reaction 
is then the map from the point uJi{x) to the point ujf(x) in the space of vector pairs, 
which is generated by the unitary operator e lAx : 

uo f {x) = ( e iAx , e iAx ) Ui(x). (5.30) 

A way of seeing this is to consider giving a finite time duration T to the mea- 
surement of A, and in between, at some time eT (with e < 1) after the interaction is 
switched on, insert an impulsive but very weak measurement of some other observ- 
able B that does not commute with A. In this case the parametric dependence of 
the amplitude function is on two variables, x, and the reaction variable of the other 
apparatus, call it y: 

(Vvle^-^V^V^Vi) • (5-31) 

Concentrating then on a fixed value of x, the weak value of B at the point of no- 
reaction, y = 0, is then 

Re^ -. — . (5.32) 

(Vvle^lV'i) 

Thus we see that by moving from e = to e = 1, the weak value of b changes from 
that evaluated at u>i(x) to that evaluated at tOf(x). 

5.3.2 Weak Values 

We also saw how in the semi-classical picture, the weak value of the operator A(q) 
is the function A(q) evaluated at the classical trajectory given x; hence, at the point 
of no reaction x = 0, A(q) is evaluated at the free trajectory, and so, for instance, 
the weak value of say q 2 will be equal to the weak value of q, squared. In general, 
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however, this shall not be the case. Not only will the weak values of A 2 and the 
square of the weak value of A differ in general, but moreover the weak value of A 2 
need not be positive. 

In this respect, it is important to note that weak values are defined op- 
erationally as the response to an almost-perfect unitary transformation generated 
by the observable A; hence, one should not expect a priori any particular relation 
between the weak values of two operators that have a common spectral decompo- 
sition, as they may lead to entirely different transformations. It is therefore more 
convenient to think of weak values in terms of the algebra of generators of unitary 
transformations, where A and A 2 , although having common eigenvalues, may nev- 
ertheless be linearly independent. Note for instance that for a spin-1/2 particle, the 
operators S 2 , S 2 and S 2 are equivalent to the unit matrix. Therefore, they are 
generators of a trivial unitary transformation, namely an overall phase change. On 
the other hand, the square of the weak value of S x , a generator of rotations, may 
take arbitrarily large values. 

5.3.3 Relative Initial State 

The interpretation of the relative initial state \4>[^} as a sort of posterior initial 
quantum state is admittedly a more delicate matter. As we argued at the beginning 
of the chapter, this choice is practically determined by the semi-classical approxi- 
mation on the system in order to interpret the average shift in the momentum as 
the same effect both in the classical and quantum descriptions of the apparatus. 
Furthermore, we have also found it a very convenient way of analyzing the response 
of the apparatus, as we shall do in the next chapter, if the emphasis is placed on the 
reaction variable x as we have done all along. The intuition comes precisely because 
of the fact that in the x-representation, the likelihood transformation can then be 
interpreted in terms of classical probability. 

One may nevertheless wonder if a more direct connection to probability can 
be established for the state \4>[^} within the two- vector formulation. For this we 
note that if only an initial vector is given, then there are an infinite possibility of 
final states to fill the missing slot. Hence the standard quantum state according to 
the two-vector description automatically 

Thus for instance, if a post-selection is performed on the apparatus in a given 
basis B = {\xu}}, the weak values r of some apparatus observable T are distributed 
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according to 



dP{T\4>fB)=dTY,\\(Xu\cl>f)f5 



V 




r-Re 



) 



(5.33) 



The mean value of this distribution, as one can then verify, is 



(5.34) 



the standard expectation value. 

One may then ask how this works with the same post selection, but for the 
weak value of T at some time, a) before the measurement interaction, and b) after 
some prior determination of the actual initial state \<fii). In this case one obtains a 
distribution similar to Eq. ()5.33|) . except that the weak values in the argument of 
the delta function are now 



with the same weights as before. The summation cannot be worked out any further 
than Eq. (J5.33|) without additional knowledge of the final basis. However, it is 
still possible to see two indications that the relative state \<fii) does convey some 
statistical information about the state of the apparatus before the interaction, if by 
this information we mean averages of weak values: 

First, when T is any operator function of x, T(x), one can easily see that 
the weak values coincide both before and after the measurement interaction. This 
should not be too surprising as x is a constant of the motion. Noting therefore that 



we see that given any basis of post-selection on the apparatus, the average weak 
value of T(x) before the interaction is given by its standard expectation value given 
the relative initial state \4>f^)- 

Secondly, when T is not a function of x, one can still find a simple basis- 
independent expression for the average weak value before the interaction, namely: 




(5.35) 




(5.36) 



Tj = Re 



(4>i\P^^\xiJi)\(pi) 




(5.37) 
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The expression differs from the expectation value of T given the initial relative state 
\<jff^) by additional term involving the likelihood factor, the significance of which 
is unclear at this point. Nevertheless, one easily notes then when T is the pointer 
variable operator p itself, 

Re(# } | [ y/Z~(&j, p ] |&> = Im(# } | ^|*?°> = 0. (5.38) 

This is an entirely satisfying result as it then shows that the reference origin for the 
"kick", i.e., 

(Pi) = (4>f\p\4>^) (5-39) 

indeed corresponds to an average of the momentum operator, in this case its average 
weak value, immediately before the interaction occurred. 



5.4 Connection with Likelihood Factor 

We shall now turn to more practical considerations regarding the model. As we 
have seen, forefront in the picture of sampling weak values is the distribution for 
the reaction variable x. As should be clear then, the distribution of interest is the 
posterior distribution 

dP(x\<%) = dP{x\<j) i )L il {x) (5.40) 

and not its prior distribution, as in the classical situation. It is then this distri- 
bution, in which the Likelihood factor plays a decisive role, which defines then the 
appropriate conditions under which the model may be linearized. Also, away from 
the weak regime, we saw in Chapt. 2 how the mean and variance of the data are 
connected to the distribution in x. We have already seen how the mean is connected 
to the posterior distribution. What remains therefore is to establish the connection 
in the variance. In the following two sections we shall then consider the recovery of 
the linear model in the weak regime and the general connection between the "error 
laws" and the posterior distribution in x. Before doing so, a cautionary remark is 
in order: 

As its name implies, the non-linearity in the model stems from the fact that 
the response of the pointer variable is now seen as an impulse in x which is generally 
a non-linear function a M (x). Such a model could be separated in classical mechanics 
if x and p were initially uncorrelated; in that case then, the model could be turned 
back into a linear model by a trivial redefinition of the variables, i.e. A = a(x). 
Clearly, such is not the case in the quantum version, as p and a^(x) satisfy the 
commutation relation [p,a^(x)] = —ia'(x) / 0. 
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The idea of sampling in the statistical sense must then be taken with some 
caution when dealing with the overall shape of the pointer variable distribution. 
One should keep in mind that the basic object is always the wave function 

cf>f{p) = J= dx ^/L^)Ux)e lS ^ x) - ipx (5.41) 

from which the conditional probability distribution is derived. The sampling should 
hence be understood in the sense of Chapter 2, by considering the wave function 
as a coherent superposition of narrow samples, each of which if narrow enough 
may then be treated as being shifted by the local weak value. The point then is 
that the elements of the superposition also carry phase information and hence the 
interference between the samples will show up in modifications to the moments of 
the pointer variable distribution which are higher than the first moment. 

5.5 Recovery of The Weak Linear Model 

Let us then consider the conditions for the recovery of the the WLM as the model 
"sharpens" with x to the point where a single weak value is sampled. This occurs 
when the magnitude of the relative initial wave function (f>^\x) = yj L^(x) <pi{x) is 
sharply peaked around a value x^, in which case we can apply a "group velocity" 
approximation to Eq. (|5.41|) as in Chapter 2. A necessary condition for this is 
then that the nonlinear terms in the phase expansion a' fl (x /1 )(x — x^) 2 + ... may be 
neglected. Hence, assuming a^(x^) is not too small compared to the higher order 
terms, we demand that 

aJ 1 (i M )A^«l. (5.42) 

In passing, we note that the gradient a'^{x) of the local weak value may also be 
expressed in terms of the imaginary part of a complex variance-like quantity, namely 

a'{x) = — Im 

as can be shown with relative ease. 

Let us suppose that S(x) can be expanded in a Taylor series around x^ up 
to the linear terms in (x — x^). The final wave function is then a rigid translation 
of the Fourier transform ^^(p) 

^\p) ~ eiSM-ia^x,, ^)(p _ a ^ Xll )) (5.44) 



(5.43) 
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where the translation is the local weak value evaluated at !„. Thus we recover in the 
resulting conditional distribution the statistically separable form under the WLM 

dP(p \4ff>) ~ dP(p - a(^)|4 M) ) , (5.45) 

where a single weak value is sampled. It is important to note two generalizations 
from the WLM as discussed in Chapt. 1. First is the obvious generalization to 
a different "sampling" point away from x = 0, as we saw in Chapt. 2. More 
important now is the fact that what determines the linearity condition is the relative 
initial state \4>i) and not the initial state \4>i); hence; the probability distribution 
that gets shifted is not in general dP{p\4>i) but rather dP(p\<ft\^). This leads us then 
to consider the conditions on the Likelihood factor under which "sharpness in x" is 
achieved. 

For this we may concentrate on the probability distribution as opposed to 
the amplitude, and draw the intuition from the classical situation described earlier. 
Similarly to what we saw there, it is the interplay between the prior dP(x\4>i) and 
the likelihood factor L^{x) that ultimately determines whether a weak regime of 
sharp x is achieved or not. Here, the condition that dP(x\cpi) is "sufficiently sharp" 
around some value x Q , say that Ax be small compared to \a(x^)/a'^(x^)\, may not 
be enough to guarantee that the posterior distribution dP(x\cf>i)L^(x) will be sharp 
as well. It could happen, for instance, that the transition probability P(ip^\x^i) 
in the likelihood factor has a minimum at x Q and rises so fast that the posterior 
distribution is considerably widened or "dented", as in Fig. (|4.2j) . Or, it could 
be that somewhere around the tail regions of dP(x\<j)i), the transition probability 
P(^pfj,\x ipi) becomes overwhelmingly large. In that case, the mass of the distribution 
shifts to that region where the values of x are most favorable to the transition. 

If one is therefore interested in probing the local weak value close to some 
point x , the "probe" dP{x\4>i) needs to be sufficiently robust against the likelihood 
factor (see Fig. 15.1(1 . It is important to note in this respect that unless the support 
of dP(x\4>i) is completely severed outside the region of interest, the robustness con- 
dition is generally a global condition if dP(x\4>i) has tails stretching out to infinity. 
However, as the transition probability P(i/j^\x^i) can never exceed unity anywhere, 
robustness can always be achieved by imposing a sufficiently fast fall-off rate of 
dP(x\(j)i) outside the region of interest. 

Now, assuming that the prior is then both sharp and robust against the 
likelihood factor, the leading order effect of the latter should be should then be a 
small "shift" towards regions in x favorable to the transition, similar to the one 
illustrated in Fig. 14.21 The resulting "sampling point" point = x^, or point of 
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"ROBUST" 




"DOCILE" 




Figure 5.1: Robustness/docility of three prior distributions against the likelihood 
factor. A relatively sharp prior may not be sufficiently robust to ensure that the 
posterior distribution is sharp as well, if for instance, its rate of fall-off is faster than 
the rise rate of the likelihood factor. 



maximum likelihood produced by this bias, may then be interpreted as the most 
likely value of x at which the transition occurred. In other words, we view the 
system as suffering a "back-reaction" ~ e lAx ^_ 

To a leading approximation, this sampling point may be obtained by a Taylor 
expansion of L^ix) about x Q and keeping the first-order term. Now, if one writes 
the complex local weak value Aw (x) as 



¥- = aJx)+ipJx), (5.46) 



(VVtle^l^i) 

it is then easy to see that the first derivative of the log-likelihood factor is 

^log^(x) = -2^(x). (5.47) 

where P^(x) is the imaginary part of the complex local weak value A^^x); thus, 
one may approximate locally as 

If dP(x\4>i) then satisfies a sufficiently rapid fall-off condition, then the exponential 
approximation of Aharonov and Vaidman is valid and the shift can be viewed as an 
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imaginary shift of the initial wave function (pi(x) by the imaginary part of the weak 
value. This produces to leading order a shift from the a priori sampling point x Q of 
approximately 



2 Ax 2 Im^'W, 



(5.49) 



and in turn this provides an operational interpretation of the imaginary part of the 
weak value. 

However, the idea of a sharply defined sampling point may be inadequate, 
for example, if the posterior distribution is considerably wider than the initial distri- 
bution but the the weak regime can still be achieved (i.e., slowness of a(x)). In such 
case, corrections to the width in x must be taken into account. In the case of a Gaus- 
sian packet, the leading correction to the width in x in the Gaussian approximation 
is easily obtained: 

op ^ , ° • (5.50) 

We note that similarly to the case of the real part of the complex weak value in Eq. 
(|5.43[) , the gradient (3 1 (x) is given in terms of the real part of the same variance-like 
quantity, i.e., 

(W|iV i: ^i> (Wlie^lVi) 2 



13' {x) = Re 



(A 



■3% j\cc ij 



(5.51) 



These expressions, in conjunction with Eq. 15.431 also determine that (and as already 
seen in Chapter 2) 



Ax 2 < 1 



(5.52) 



is a necessary local condition which, in addition to the global robustness condition, 
guarantees that the response of the apparatus can be described in terms of a complex 
shift by of the initial wave function in the p-representation (here assuming 

that the sampling point ~ 0). 



5.6 Error Laws 

We now turn to the connection between the likelihood factor and the "error law" for 
the conditional distribution dP(p \ 4>^), for which we now assume that the variance 
exists. Assuming for simplicity a real state with pi = 0, the variance is then given 

by 

Ap} = (<f> ( ? ) \f\<l><f ) )+Aal, (5.53) 
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where Aa^ is the variance of the local weak with respect to the posterior distribution 
in x and demands no explanation. On the other hand, it is in the variance of p in 
the first term where we begin to see differences between the classical and quantum 
mechanical probability models for the apparatus. As mentioned earlier, classically 
one can completely eliminate the correlations between p and x in the posterior initial 
distribution. In such case the posterior initial variance in p remains the same as its 
prior. On the other hand, this cannot happen quantum mechanically as it would lead 
to a violation of the uncertainty principle, if for instance the posterior distribution 
in x is narrower than its prior. 

To see therefore what the effect of the likelihood factor is on the initial vari- 
ance, let us assume that both the likelihood factor and <j>i{x) are twice-differentiable. 
Now define for a given probability distribution a "quadrature" Q(x) 



<3( x ) = -in lo § — ( 5 - 54 ) 

ax A ax 



so that for instance a a normal distribution has a constant quadrature 1/a 2 . the 
r.h.s. are the corresponding expectation values of p and a(x) taken with the reference 
state \4>^)- We the have, for the posterior distribution a quadrature 

Q fl (x) = Q l {x) + 2p'(x) (5.55) 

where Qi(x) is the quadrature of dP(x \4>i). For a real state then, one can easily 
show that 

(f) = -(Q(x)}. (5.56) 
Hence we can recast (|5.53[) as the "error law" 

^p) = \ (Qi(x) ) + \{0{x) > + (Aal(x)} , (5.57) 

where now all averages can be taken with respect to the posterior distribution in x. 

Noting then that for the initial state the average of p 2 is the first term 
I (Qi( x ))i but averaged over dP(x\4>i), we see that there is a sense in which real 
Gaussian states may be regarded as the least biased of test functions, for in that 
case the average {Qi{x)) coincides both in the prior and posterior cases, and one 
then has: 

Ap) = Ap? + ±{${x) ) + <AaJ(a:)> , (5.58) 

where Apf is the variance with respect to the initial Gaussian state. In that case 
then one may interpret the error law as the contribution of three terms: the initial 
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noise plus the variance in the sampled weak values plus an additional correction to 
the width due to the likelihood factor. 

Let us now see what happens when the posterior distribution in x becomes 
sufficiently narrow about a sampling point x„. In that case, the uncertainty in the 
weak value may be disregarded altogether, and the average of (3'{x) may be replaced 
by its value at the sampling point. Assuming an initial Gaussian state, we then have 
the "error law" for weak measurements: 

A^ = Ap? + i/3'(x M ). (5.59) 

This expression may be cast in a slightly more intuitive form by recalling from Eq. 
I5.51l that (3'{xn) may be expressed in terms of a variance-like quantity. However, the 
formal similarity to a true error law where this quantity is viewed as something of a 
"weak uncertainty" should not be pursued too much. For one, the factor of 1 /2 has 
no place in such error law, at least in a linear model. More importantly, however, 
is an interesting consequence of the fact that "likelihood in x has effects in p" : 

Suppose that the likelihood factor is a minimum at the sampling point so 
that ^(x^) = and < 0. In that case then one should see a "stretch/squeeze" 

effect: a decrease in the variance with respect to that of the initial distribution, an 
effect which of course would be impossible to understand classically if x and p were 
assumed to be uncorrelated a priori. 

The "stretch/squeeze" effect is characteristic of the weak regime only. Since 
the transition probability P(tp^\x^i) can never exceed unity, it is guaranteed that 
if the likelihood factor has a local minimum, then it must also have at least two 
local maxima, perhaps at infinity, where (3' > 0. Since in the strong regime the 
prior distribution in x will be docile with respect to the likelihood factor, then the 
predominant contribution to (/?') will come from precisely those regions of maximal 
likelihood where (3' > 0. An illustration of the "stretch/squeeze" effect will be given 
in the next chapter. 

5.6.1 Pooling The Data 

Using the results of the the previous section, we finish this chapter by seeing then 
how the standard error laws of the unconditional distribution, i.e., 

(p) f = (V>i|i|V>i> 

Ap} = Apf + (^ 1 |Ai 2 |^i); (5.60) 

are recovered in the "pooling" of the data from all the post-selected sub-samples. 
For simplicity we assume an initial Gaussian state with pi = 0. The sum rule for 
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the expectation value is simple enough. With the bar-average, weighted with the 
transition probabilities P(Vvl'^V'i) = Pii'^ })■> one clearly has 



(p f ) = ( a ) = (5.61) 

The standard average can then be interpreted as a double average of the weak values, 
first over x given a specific transition, then as average over all transitions. For the 
variance, we need the conditional averages of p 2 , which are then given by 

(P 2 f) = (P?) + + (« 2 > , (5-62) 



and from this we then obtain: 



(pj) " <P/> = (Pi) + + (« 2 ) " («) (5-63) 

comparing then with the expression for the variance in the unconditional distribu- 
tion, we see that 

^i|Ai 2 |^i) = ^W> + A^+(«> 2 -H 2 - (5-64) 

The rightmost two terms are easy to understand: one is the average variance of 
a over all sub-samples, the other the scatter in the sub-sample averages. It is 
interesting to note however that neither of these two terms yields an expression 
that is independent of the final basis; in other words, the scatter in the weak values 
generally carries a trace of the final choice of measurement. The trace is "covered" 
by the first average squeeze term. 

Finally, let us turn to the pooling of the errors in the case of weak measure- 
ments. For simplicity let us take a Gaussian prior in the limit Ax — ► 0. Let us also 
assume that for all transitions, (ip^\ipi\ 2 an d robustness so that the sampling 
point tends to x = for all transitions as Ax — > 0. We thus take the weights to 
be the unperturbed weights KV'/JV'il 2 and neglect the dispersion in a, in such case 
Ea. (|5.64|) simplifies to 

(^|Ai 2 |^) = ^W + a 2 -a 2 , (5.65) 
where for each transition we take = Re ^^j^y an d 



(3'=Re 



<WIV>i> <VvlV>i> 



2 



(5.66) 
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The above gives therefore an interpretation of the increase in the variance in the 
unconditional distribution as made up of both the scatter in the weak values, and 
the average effect on the widths of the initial distributions. It is also worth noting 
another, less operational interpretation of the unconditional variance in terms of the 
transition variances of the real and imaginary parts of the weak value. Noting that 
the bar- average of j3 = Im^j^^y- vanishes, it is then easily verified that |25j 

(i;\AA 2 \^) = a 2 - a 2 + f3 2 -p 2 . (5.67) 

As the expression is now written as the sum of positive-definite quantities for each 
transition, as opposed to Eq. ()5.65|) . we obtain the general result that the scatter 
in a around (ip\A\t()} is always smaller than the uncertainty y (t(;\AA 2 \ij:} . 
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Chapter 6 

The Non-Linear Model in 
Action 

6.1 Eccentric Weak Values and Super Oscillations 

According to the results at the end of the last chapter, for a pre-selected sample 
the uncertainty in a around the standard expectation value is always smaller or 
equal to the standard variance. Thus, one is to expect that for the overwhelming 
majority of boundary conditions, the weak value of A will not show significant 
deviations from the range of expectation defined by the spectrum of A. The "pearls" 
of weak measurements are, however, those exceptional circumstances where the 
conditional expectation value lies outside, perhaps significantly outside, this prior 
region of expectation. As we saw in Chapter 2, these effects could be understood 
in the representation of the pointer variable as a curious interference phenomenon, 
whereby wave functions shifted by the eigenvalues of A interfere destructively in 
the "normal" region and somehow, almost magically, interfere constructively at the 
location determined by the weak value. 

On the other hand, as we have argued within the model, the real part of 
the weak value corresponds to a unitary transformation defined by the phase of 
the transition amplitude (ip fl \e lAx \tjji) , a transformation that becomes a definite 
kick to the same extent to which x becomes definite in its posterior distribution. 
The "magic" must therefore be related to an anomalous behavior of the amplitude 
function around the sampled region, and, in particular, of the phase factor. This 
little known phenomenon in Fourier analysis is known as that of of super-oscillations: 
a synthesis of Fourier modes which locally exhibits an oscillation frequency outside 
of its Fourier spectrum |26U27| . Let us now look at a simple way of generating such 
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functions. 



6.1.1 N-spins 



For concreteness, consider first what turns out to be in fact a rather innocuous 
example: a single spin- 1/2 particle pre-and post-selected in eigenstates |— 7/2} and 

I7/2) of 



Sill 



S x + cos 



sin — ) S x + cos — 5 



S z 



1/2 
1/2, 



(6.1) 



respectively, and an intermediate measurement of A = S z . When 7 = tt/2 this is 
the situation described in the introductory chapter, where the weak value of A is 
the vector sum of the initial and final spin directions, i.e., For other values of 

7, the weak value of A at x = 0, is easily computed by adding the two "constraint" 
equations (|6.1|) and "solving" for S z : 



a(0) = 



(7/2|^|-7/2) 
(7/2|-7/2) 



2 cos (1) 



(6.2) 



and thus, for instance, if 7 ~ 0.9977T, a(0) ~ 100. Now turn to the behavior of the 
transition amplitude as a function of x: 



cos — I + i2a(0) sin — 



( 7 /2|e^|- 7 /2) = (7/21-7/2) 
where the phase (here denoted by rj(x) to avoid confusion) is 



■nix) 



arctan 



2a(0) tan 



and the likelihood factor is 

L(x) oc 1 + (4a(0) 2 - l) sin 2 (~ 



(6.3) 



(6.4) 



(6.5) 



As one can see, the function is made up of two modes e ±lX//2 , and yet we find an 
instant frequency of oscillation in the phase 

a(0) 



a(x) = r\ (x) 



1 + (4a(0) 2 - 1) sin 2 (f ) ' 



(6.6) 
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which lies outside the bounds of the spectrum when 

1 



\x \ < 2 arcsin 



v/2a(0) + 1 



a(0) 



(6.7) 



Unfortunately, the instant frequency of oscillation does not show up in faster "wig- 
gles", as the anomalous region is simply to small. The absence of fast "wiggles" 
translates in turn to a very low significance in the effect on the expectation value of 
the pointer variable. To see this note that the denominator in the weak value scales 
as the likelihood factor itself. This means that when Ax is small and a(0) large, 
the expectation value of a(x) will be 

<""» " ^wB + 1 • (6 - 8> 

Thus, to produce an average shift of ~ a(0), we need Ax <C l/a(0); this entails 
that the uncertainty in the pointer will be much greater than the signal itself. 

Consider however what happens under a the following re-scaling jS] : instead 
of measuring A on a single spin, we measure, for instance sequentially, the "average" 
operator J2i A-i on a system of N non-interacting spins, each one pre-and post- 
selected on the same states above. In such case, the relevant transition amplitude 
can be expressed as an iV-fold product of the single-particle amplitude, with x scaled 
down by a factor of N: 

(7/21^1-7/2)^. (6.9) 

The spectrum is still within the same bounds as before, except that now it is far more 
richer; the modes are now : e~ lx ^ 2 , e~ % ^~ x / 2 , e % ^~ x l 2 , e +lx ^ 2 . The phase and 
likelihood factor can then be expressed in terms of their single-particle counterparts 
rj(x), L(x), as 



V(N) ( X ) = N V (j^j 



x 



L {N) (x) = L N ^j ; (6.10) 
thus, for the weak value we have 

a {N) (x) = a (jA . (6.11) 
Assuming a(0) 3> 1, the "kick" now behaves around the origin as 

°W W = 1 + to (0)W(^) =<««» +<•«»• (£)' + ■■■. 
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Figure 6.1: Local superoscillatory behavior of the real part of /2\e %A ^ \— j/2) N for 
a(0) = 5 and three values of N. The case N = 1 corresponds to the fastest Fourier 
mode in all three cases 



and the iV-spin likelihood factor as 

N 



L(N) oc 



\2 (riT1 2 ( ±_\ 



l + 4a(0) 2 sm ( y y y 




l + «(0) 2 U? +-■ ( 6 - 13 ) 



What we can do then is fix some arbitrary interval in x around x = 0, say —l>x<l, 
and choose a value of iV large enough so that within this interval the likelihood factor 
is essentially flat and a(x) essentially a constant, so that the amplitude function 
behaves as 

( 7 /2|e a t 1-7/2)* oc e ia{0)x (6.14) 

within the interval. With this prescription, one can construct a function which 
locally "wiggles" faster (see Fig. 16.1(1 than any one of its Fourier components for 
arbitrarily large number of periods. 

As suggested earlier, we then have a prescription for "raising the signal above 
the noise" . As one can see from the behavior of the likelihood factor, the requirement 
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on Ax is now 

(6.15) 



thus, the significance ratio of a(0) / Ap is now raised by a factor of v N. Furthermore, 
note that the leading correction to a(0) scales in this case as 1/N 2 , and therefore, 
if the ratio 

e = ^L (6 . 16) 

is small, the relative uncertainty in a is 

Aa 6 (6-17) 



a(0) ' 

a factor of \/]V smaller. Hence, in the N — ► oo limit, it is possible to attain an effect 
on the pointer that is both as significant and as precise as one desires. 

6.1.2 Rise / Fall-Off Conditions 

At what cost then do these "pearls" come? A preliminary answer to the question 
can be found in the fact that if the amplitude factor behaves essentially like the 
phase factor e ia (°) x within the region —l<x<l, then if only this region is sampled, 
the probability for the iV-spin transition 

| - 7 /2) ® |-7/2) ® ... <g> | -7/2) -»• |t/2) ® h/2) ® ... ® | 7 /2) (6.18) 

is essentially the same as the unperturbed transition probability 

||( 7 /2|-7/2)|| 2Ar =cos 2Jv g) . (6.19) 

However small the probability is then for a single spin, the iV-spin probability is 
exponentially smaller. 

A second clue is found by looking at the global behavior of the amplitude 
function {'y/2\e lA ^'\—'j/2) N . As we can see from the example of a single spin, the 
measured observable A = S z induces a rotation around the z-axis. This means that 
when in the iV-spin case x takes the values ±Nir, the initial directions |— 7/2) are 
rotated ( up to a phase factor (—1)^), into the final directions I7/2). In such case 
then the transition probability is unity. One must therefore have a behavior of the 
likelihood factor away from x = that reflects the rotation from a very unlikely 
configuration, i.e., ( | — 7/2), I7/2) ) to the very likely configuration ( I7/2), I7/2) ) 
and so on. Inspection of the global behavior of the transition amplitude, in terms 
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of the log-likelihood factor and the local frequency of oscillation (see Fig. I6.2|) ). 
suggests that away from the super-oscillatory region the magnitude of the function 
rises exponentially. This indeed is the "catch" in the phenomenon: super-oscillations 
are suppressed exponentially in the amplitude function. 

Let us now give a general argument as to why this exponential rise about 
the super-oscillatory region is to be expected. Suppose one wishes to probe some 
arbitrary transition with an amplitude function g(x) built up of modes of wave 
number \k\ < k max , 

g(x)= f maX dke lkx ~g{k) , (6.20) 

and which on the other hand shows super-oscillatory behavior g{x) ~ e lKx about 
the region —l<x<l with local wave number K ^> k max + tt/1. The intention is 
then to isolate the region by choosing an appropriate test function <fii(x) suppressing 
the rise in magnitude away from the super-oscillatory region, so that 

/oo 
dxe~ ipx g{x) ^(x) — 4>i(p — K) . (6.21) 
-oo 

Here, we denote explicitly Fourier transforms in the p representation with a tilde. 
Consider then the following function in the momentum representation 

0(p) = ( e ~^> \P\<Po . (6.22) 
[ \p\> p 

This "bump" function (see Fig. I6.3[) is common in analysis; its main property is 
that while the function is clearly not analytic, it nevertheless has derivatives of all 
orders for all values of p, including the bounds of its support p = ±p . Now, from 
the convolution theorem we know that <j>f(p) can be written exactly as 

MP) = / dkg{k)cj>{p-k). (6.23) 

It is clear therefore that if we choose p Q ~ tt/1 in such a way that k max +p < K, the 
support of 4>f(p) vanishes around the superoscillatory wave number p ~ K and the 
function is inadmissible as a probe. In turn this entails that the Fourier transform 
(j>i(x) of the "bump", which is a function peaked at x = of width of order I, is 
nevertheless unable to suppress the rise in magnitude of g(x) outside the super- 
oscillatory region. In other words, the rise in g(x) has to be faster than the fall-off 
rate of (f>i(x). It is then easy to show that 4>i(x) falls-off faster at infinity than any 
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Figure 6.2: Global behavior of the superoscillatory function {^/2\e tA ^ | — j/2) N , in 
terms of the local frequency of phase oscillation a^^(x) and the logarithm of the 
likelihood factor, for a(0) = 5, N = 50. The shaded strip indicates the region of 
normal oscillatory behavior. 
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power of x. For this one notes that since (f>i(p) has derivatives of all orders, then the 
expectation value of any power of x can be written as 

(x n ) = J dxx n <Pi(x) 2 = J d P 4>(p) (jLyfo) (6.24) 

Since furthermore (fi(p) is bounded, one then has 

(x n ) < oo Vn > . (6.25) 
Taking even n = 2m, we then have for any m > 0, 

lim \x\ m (/>i(x) =0. (6.26) 

x— >±oo 

We conclude that the rise in g(x) away from the super-oscillating region must be of 
exponential order. 



6.2 Illustration of Likelihood Effects in The Weak Regime 

We have thus seen how anomalously long periods of super-oscillatory behavior in 
the phase of the amplitude function can occur in conjunction with an exponential 
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Figure 6.4: The "stretch/squeeze" effect. The dotted lines indicate the likelihood 
factor and the arrows the effect on the prior distribution. 



behavior of the likelihood factor. The combination of these two anomalous behaviors 
provides a good illustration of two previously mentioned effects associated with the 
likelihood factor in the weak regime: 

The first is the "stretch/squeeze" effect. The effect is most notorious when 
the region is sampled precisely at the point of minimum likelihood, with the most 
docile exponential distribution that is still robust enough to overcome the exponen- 
tial rise. In the case of the spins, the fall-off rate of the distribution is suggested 
by the leading order behavior to the log-likelihood factor around x = 0, which as 
one can see from Eq. (|6.13|) . is quadratic. With this suggestion, the test function 
4>i{x) should be a Gaussian and a numerical calculation shows that indeed it does 
the job. We show this in Fig. 16.41 for the case of a(0) = 5, N = 50, and an initial 
Gaussian of width a = tt/4 in x . The sharp rise of the likelihood factor in both 
directions around x = entails a posterior distribution in x that is wider by a factor 
of approximately 

- J(Q)^ 1 - 6 - ( 6 - 27 ) 

1 1 50 

This stretch in x translates to a corresponding squeeze in the distribution in p, 
which is shown shifted by the sampled weak value a(0) ~ 5. Note that although 
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Figure 6.5: The "shift" effect. While the prior distribution in x is centered around 
x = 2tt, the likelihood factor rises so fast that the posterior distribution ends up 
centered at x ~ 3tt. The sampled weak value The dotted lines indicate the likelihood 
factor and the arrows the effect on the prior dstribution. 

the posterior distribution in x is wider than the prior, the dispersion in a(0) is 
still small enough for the squeeze to be evident in the pointer variable distribution. 
As mentioned earlier, the relative uncertainty is suppressed by an additional factor 



A second likelihood effect is the "shift" . The effect sets in as the location of 
the sampled region is moved away from the minimum likelihood point, in which case 
the likelihood factor overwhelmingly favors one direction in x. Again, if the distri- 
bution is docile enough the effect can become notorious. We illustrate this in Fig. 
16.51 with the same settings as before, except that the location of the sampled point 
is now taken to be x = 2tt. In this case, the location of the posterior distribution, 
call it x', is given by the solution to the equation 



This turns out to be, numerically, x' ~ 9.03, which is close to 3tt. The effect is then 
evidenced from the pointer variable distribution in the fact that the "kick" , instead 
of being the weak value at x = 2ir, i.e., a(27r/50) ~ 3.6, turns out to be about 



i/Vn. 



(x' - 2tt) 
(vr/4) 2 



= -}-log£(50)(y) • 



(6.28) 
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Figure 6.6: The biased weak value a{x' /N) vs. the actual weak value for different 
prior locations x and the same uncertainty in a = 7r/4. The jagged behavior at the 
peaks is due to instabilities in the root-location algorithm. 
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30% smaller, the weak value at x = 3ir, a(2-7r/50) ~ 2.7. This is interpreted as a 
reflection of the fact that the mean rotation angle of the spins is 37r/50, as opposed 
to 27r/50, expected a priori. 

Finally, we show in Fig. 16.61 the results of a numerical calculation for a 
situation where one "scans" the super-oscillatory region with the same initial test 
function but centered at different locations. For a given prior location x, the figure 
shows the "biased" weak value at the corresponding displaced location x' vs. the 
actual weak value at x. As expected, the bias is always towards regions of increasing 
likelihood where the weak value is smaller. This explains the "tightening" of the 
weak value curve. 



6.3 Negative Kinetic Energies 

Another interesting illustration of super-oscillatory behavior is provided by a particle 
initially prepared in an eigenstate of the energy and post-selected by a position 
measurement in a classically disallowed region. A sufficiently weak measurement 
of the kinetic energy operator should then yield a negative value [16] ■ An example 
that can be solved exactly is provided by a particle prepared in the ground state of 
a simple harmonic oscillator, with Hamiltonian: 

k 2 1 

H = — + -mu 2 q 2 . (6.29) 
2m 2 

In the ground state |0), H has an eigenvalue E = uj/2. If the particle is post- 
selected in a position q, then the weak value of the kinetic energy operator T = J^, 
immediately before the post-selection, is 

(q\H - lmuj 2 q 2 \Q) uj 1 9 9 , 

T(q,x) = — 2 : ' ' = mu?q 2 . (6.30) 

W ' ; (g|0) 2 2 y V ; 

Thus, in the rare event in which q happens to lie outside the region determined by 
the classical tuning points \q\ < 1/^/mw, the weak value k is a negative number. 

To analyze this effect, we consider the amplitude function for such measure- 
ment, which is given by 



L(x)e iS{x) oc (q\e ifx \0)m. (6.31) 

From the point of view of the transformations generated by T, we see that the 
amplitude may be interpreted as the diffusion of an initial wave function tp (q) = 
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(q\0) with diffusion constant D = —i/2m though the time x: 

lL{x)e lS{x) oc e~ ix £?Mq) 
where ip {q) is the ground-state wave function of the harmonic oscillator 

\ 1/4 

muj \ ' 



i>o(q) 



(I 



V 7T 



e v 2 



™^ g 2 



(6.32) 



(6.33) 



The diffusion problem is elementary to solve for a Gaussian. Up to inessential 
constants, the amplitude function is given by 



^L(x]e lS{x) 



1 



oc 



g 2(1 — ixui) 



)<-r 



\/T— IXOJ 

From this we may then extract the likelihood factor and the phase: 

1 



(6.34) 



L(x) oc 



y/l + x 2 uj 2 



S(x) = — arctan(xu;) — -raw 2 



.1 + X 2 LV 2 

and finally, from the phase, the local weak value r(q,x) = S'(x) 



m, . 



1 



T (X 



1 + X 2 UJ 2 



oj muj 
2 2~ 



4^2 



+ 



muj x 



{l + x 2 oo 2 ) 2 



(6.35) 



(6.36) 



We illustrate the behavior of the Likelihood factor and the weak value r(x, q) in 
Figure 16.71 

This behavior of the local weak value may be understood in terms of two 
quantities, an x-dependent effective frequency 



to 



UJ{X) 



1 + X 2 ijJ 2 



(6.37) 



and a de Broglie momentum of the particle at the location and time of the post- 
selection 

dS muj 2 qx 



K(x,q) 



(6.38) 



dq 1 + x 2 uj 2 

We note that this momentum is nothing more than the weak value of the momentum 
for the diffused state, i.e.: 



k(x, q) = Re 



{q\ke llx \Q) 
{q\e ifx \Q) 



(6.39) 
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Since this momentum vanishes when x = 0, it may be thought of as a momentum 
that the apparatus imparts to the particle. The local weak value may then be 
expressed as a "bound" term plus a "free" term: the kinetic energy weak value for 
a harmonic oscillator with a renormalized frequency uj(x), plus the kinetic energy 
of a free particle with the de Broglie momentum 



Considering then a post-selection in which q 3> 1/y/muj, two regimes are 
clearly identifiable depending on the parameter x: 

As x — * 0, the renormalized frequency coincides with the initial frequency 
and the de Broglie momentum vanishes. The behavior is therefore that of a bound 
particle outside the classically forbidden region, the signature of which is a negative 
weak value 



As figure IB~71 then shows, this anomalous behavior is accompanied by a considerable 
"dip" in the likelihood function. Clearly, if the particle is barely disturbed, then it 
is only a rare event in which it will be found in the classically forbidden region. As 
x is increased away from this region, we see at around x — l/u> a quick jump in the 
weak value from negative to positive, while the likelihood function is still small. This 
may be seen as the competition between the bound and free behaviors exhibited by 
r(q, x), where the bound part still contributes a negative kinetic energy, indicating 
that q is still in a classically disallowed region, but the free part contributes just 
enough to overcome this barrier. 

On the other hand, the exponential jump in the likelihood function indicates 
a transition to a free regime where it would not have been surprising to have found 
the particle at large values of q. As one can easily see, this transition occurs when 
xlo is of the order of ~ qy/mco, which is the value necessary to lower the effective 
binding so that q lies in the classically allowed region. Beyond this, as x — * oo, the 
renormalized frequency goes down as 1/x 2 uj and the de Broglie momentum takes 
the form of a kinetic momentum with x playing the role of time : 



the particle behaves essentially as a free particle with the expected kinetic energy 




(6.40) 




(6.41) 




(6.42) 



x 




2m 



(6.43) 
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6.4 A Weak to Strong "Phase Transition" 



It was suggested earlier that the qualitative difference in the conditional statistics of 
the weak and strong regimes of measurement could possibly be an indication of two 
entirely different dynamical regimes in the measurement interaction, separated by 
a critical transition region. In exploring this possibility, we have found that a wide 
number of interesting phenomena of this kind can indeed be identified and inter- 
preted with relative ease by examining the global behavior of amplitude functions 
which locally exhibit super-oscillations. Thus far, we have seen how by probing the 
anomalous region with relatively sharp test functions, the exponential rise of the 
likelihood factor entails relatively mild effects on the overall shape and location of 
the pointer-variable distribution. On the other hand, if the probe is so wide that 
it cannot compete with the rise in the likelihood factor, the effect of the latter is 
to produce "dents" in the posterior distribution in x, as described for instance in 
Chapter 4, Fig. 14.21 The appearance of dents may then interpreted as the passage 
to another regime in measurement strength. We shall now give a simple example 
in which this other regime turns out to be the "strong" regime itself, where the 
conditional distribution exhibits a quantized structure. 

We recall from Chapter 2 the example of initial and final states of the system 
are the coherent states | ±A), for instance of a simple harmonic oscillator, for which 
the weak value of the occupation number operator N is — |A| 2 . Let us then revisit 
this effect from within our model. 

For this we compute the amplitude function: 



As we have done previously, we emphasize the role of the observable as a generator 
of unitary transformations and of X as a transformation parameter. Here N acts as a 
generator of rotations in the semi-classical phase-space of coherent states. Thus, we 
may think of x as being an angle by which, for instance, the initial coherent state 
is rotated clockwise in this space i.e., e tNx \\) = \\e tx ). Now, using the spectral 
decomposition of N, we may easily compute (— X\e tNx \X) in closed form: 




(6.44) 




e _| A |2_|A|V* 



(6.45) 



Hence, we see that the action for this rotation is 



S(x) 




(6.46) 
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while the likelihood factor is 



L{x) oc exp — 2|A| 2 cosx 



(6.47) 



The reaction to the rotation is then the local weak value of N, call it v(x): 

u(x) = S'(x) = -\X\ 2 cosx, (6.48) 

and indeed we see that it takes the value — |A| 2 at the point of null rotation x = 0. 

Moreover, we see for large |A| another example of a super-oscillating func- 
tion, in this case a series of positive frequency modes, the phase of which shows 
a negative local frequency of oscillation 50% of the time. And again the "catch" : 
the periods where the function shows superoscillation correspond precisely to those 
periods where the rotation angle x is such that the two coherent states | — A) and 
\\e tx ) are opposed by an angle of more than tt/2, where the overlap is minimal (Fig. 

EB- 

What is nice about this example is that for large values of |A| 2 , it provides 
a very simple illustration of a transition from one regime to the other depending 
on the width of the initial test function 4>i(x) (see Fig. 16.9(1 For this, we consider a 
initial minimum uncertainty preparation for (j>i(x) with a standard deviation a in x, 
centered around x = 0. Apart from a normalization factor, the relative initial wave 
function, here denoted simply as <t>t\ may then be expressed as 



<p\ X \x) = \J L(x)(j)i(x) oc exp 



x 



-|A| cos(x) - j-^ 



(6.49) 



As we can see, close to x = the factor ^jL{x) behaves as oc e + ' A ' 2x2 / 2 . This 
means that for a weak measurement of the "impossible" value v(x) = — |A| 2 , 4>i{x) 
should fall-off fast enough to suppress this exponential rise; a weakness condition is 
therefore 

a < . (6.50) 



V2\X\ 



Under such conditions 4>^ has a single peak around x = and may be treated in a 
Gaussian approximation about x = if sufficiently sharp 



, \x)^{2^i ff )-^e - ... 
where the effective width is given by 



a 



°eff ^ 



V1-2|A|V 



(6.51) 



(6.52) 



104 



1 

0.8 

0.6 

P(-X\xX) 

0.4 
0.2 


-2n -71 71 2n 

X 

Figure 6.8: Local weak value of the occupation number operator and the respective 
transition probabilities for three different values of A. The shaded regions indicate 
where the amplitude exhibits superoscillation 
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Figure 6.9: Critical behavior in passing from the weak to strong regimes, as a 
function of the criticality parameter e = 2cr 2 |A| 2 , with |A| 2 = 25 (see text). The 
dotted lines indicate the initial distributions in both representations. 
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As before, the posterior distribution in x shows the characteristic stretching dis- 
cussed earlier. 

This Gaussian approximation breaks down however as a approaches the crit- 
ical value a = \ If |A| 2 is sufficiently large, the behavior around this critical 
region can be described by keeping only the quadratic and quartic terms in the 
exponential, in which case 



(x) ~ exp 



(6.53) 



Close to the critical region, one then has the characteristic behavior of a second- 
order phase transition: At the critical point, only the quartic terms contributes. 
One then has a distribution the variance of which scales as 

A* 2 oc /^ 2eHA ' 2 * 4/12 „ m-i (6 54) 

Clearly, for large enough |A| the critical point can be reached well within the super- 
oscillatory region, where the average shift of the pointer is still close to — |A| 2 . 

Now, as a is increased away from its critical value, the point x = becomes 
a local minimum and the distribution acquires two peaks. Defining a criticality 
parameter 

e = 2<r 2 |A| 2 , (6.55) 
the two peaks are given close to the critical pont e = 1 at: 



x ~ ±^6 (j-j^j • (6-56) 

If one performs a Gaussian approximation about each peak, the resultant variance 
there goes as 

^WPij (6 - 57> 

One should then expect the distribution to break up into two well-separated distri- 
butions when Ax < x, which implies that 

For large |A| this again entails that the separation occurs for very moderate devi- 
ations of a about its critical value in which case the two peaks still lie within the 
super-oscillatory region. 
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Figure 6.10: Positive solutions for the location of the peaks as a function of the 
critical parameter e. 



As a is further increased away from this critical region, the two peaks rapidly 
separate due to the exponential increase in the likelihood factor towards the regions 
of overwhelming likelihood x = ±7r, where say the initial state |A) is rotated to | — A). 
Again, for large |A| each peak may be treated in the Gaussian approximation, where 
the location x of each is given by the first non- vanishing solutions to the equation 

x = e sin(5) . (6.59) 

The positive roots of this equation are shown in Fig. 16.101 as a function of e. Up to 
normalization, <p\ X \x) may then be written as 

(x-x) 2 _ (x + x) 2 

4 X \x) oc e 4< V/ + e *"<tr , (6.60) 

with an effective width c e //- For the effective width, we note that since the weak 
value v{x) is symmetric about the origin, and the second logarithmic derivative of 
(x) evaluated at the peak x is 

1 I 

~ + |A| 2 cos(x) = -—^ - u(x) , (6.61) 
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the width can be expressed in terms of the local weak value at x as 

°^ = jr+h*m ' (6 " 62) 

Now, as each peak samples the same weak value, the average kick of the pointer 
variable is given approximately by 

(p f ) = v {x) ~ -|A| 2 cos(x) . (6.63) 

We can then see that as sigma is increased, the kick goes from — |A| 2 in the super- 
oscillatory region to the weak value at the regions x = ±ir of maximum likelihood, 
K±vr) = +|A| 2 . 

However, once the peaks are separated in x-space, the resultant distribution 
in p exhibits interference fringes. Each peak contributes in the relative wave function 
for p a phase factor e^fMi^M) corresponding to its location in x, i.e., 

<t)f\ V ) oc r dxe- ipx+iS ^4 X) (x) 

J —oo 

/OO =H 
d£ e-^+M^g 4 - e// j ( 6 . 64 ) 
-oo 

where we have used the fact that S(x) is odd and v(x) is even. Using the defining 
equations for x and e, the final pointer variable distribution takes the form of a 
Gaussian packet, of width l/2cr e //, times a modulation factor coming from the 
interference between the two peaks: 



dP(p\4>^) oc exp 



•2a?(P-"(*)) 2 



cos 2 




(6.65) 



J f 

where 

Apj = Ap? + -v{x) (6.66) 

and Api = As one can then see, when the criticality parameter e becomes large, 
x —> 7r, and around the region p ~ |A| 2 the modulation factor becomes a maximum 
at integer values of p and zero at half integer values. One thus obtains a distribution 
in p, centered at the weak value p = +|A| 2 , of variance Ap 2 = Apf + |A| 2 /2, and 
which reflects the positive spectrum of the occupation number operator. 

The beautiful thing is that in this way arrive at an alternative description of 
the emergent quantized structure in the conditional distribution of the data. Accord- 
ing to the the non-linear model, the initially sharp wave function in p corresponds 
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to a wide function in x in which the tails brush two regions of maximum likelihood 
x = ±7r. Each region corresponds to a possible rotation of the initial state to the 
final state, where the signs denote the sense of rotation. While the a priori proba- 
bility of either rotation is quite small, the fact that the final state was indeed | — A) 
entails an enormous probability that in fact the initial state was rotated. This is 
then reflected in the two narrow peaks at x = ±7r, and the fact that the shift is the 
weak value |A| 2 = (— A|7V| — A), corresponding to the same initial and final states. 
What is then seen in terms of the standard linear model as a superposition of shifts 
of the initial narrow packet in p, with an envelope given by the spectral amplitudes, 
in the non-linear model is the wide distribution corresponding to a weak measure- 
ment at the rotated configuration of the system, but modulated by an interference 
pattern generated by two different phases acquired along the two possible senses of 
rotation. It is also interesting to note that close to the critical region (e = 1.1), one 
also obtains from the interference of the two peaks, a sort of quantization in which 
the "eigenvalues" now fall on non-integer negative numbers. 



6.5 Overall Distribution of Weak Values 

The "pearls" we have dealt with in the above examples are admittedly quite rare. 
Given a particular post-selection, the probability of finding them is exponentially 
small. Even then, one must be extremely careful in the preparation of the apparatus 
so that indeed one samples those exponentially suppressed regions. One may wonder 
therefore as to how unlikely are "eccentric" weak values overall? 

To answer this question, let us consider the probability distribution of weak 
values when only an initial condition is given and no additional information 
is known about the final state. So far, we have dealt with fixed final bases, i.e., 
B = {|Vv)l- Information about the basis is already relevant information as it 
singles out only a handful of all possible pairs of initial and final states are selected. 
The distribution is then given by 

dP(a\^B)= ]T \\^ m n[ a -Ke { ^^i . (6.67) 



IVV>6B 



As we have seen earlier, the average a of the distribution is the expectation value of 
A given ip), (A), and is thus basis- independent. On the other hand, the remaining 
information contained in this distribution, i.e. the scatter about its average is basis- 
dependent. 
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To obtain a basis-independent expression, we should then consider all pos- 
sible final states that may occur under all possible post-selections that one may 
envision, giving prior probabilities to each final state. In this case, the weight factor 
which is naturally defined is the Hilbert-space overlap between the initial and final 
states. Thus one has 



dP(c#) = r75 ,,,) i nN2 ■ (6.68) 

where ) is a uniform measure over all states |^) in Hilbert space. Note that 
in fact the integral overcounts each final state since two states differing only by a 
phase factor are equivalent; this overlap is taken care of by the normalization factor 
in the denominator. To calculate this integral, it becomes more convenient however 
to express it as a marginal distribution of the overall distribution d 2 P(af3\?p) = 
d 2 P(z\ip) for 

* = a + tf=%!M, (6.69) 
both the real and imaginary parts of the complex weak value: 

dP(a\ip) = [ d 2 P{a(3\ip) (6.70) 



P 

The two-dimensional probability distribution for z is then given by 

2r2 ( „ (VvlilV'i) 



d 2 P(z\^) = dh r - n/l \ fV||9 >- , (6.71) 



where d 2 z = dad(3 and the complex delta function for a complex number z = x + iy 
is defined as 

5 2 (z - z ) = S(x - x ) 5{y - y Q ) . (6.72) 

The integral (|6.71j) is easily evaluated if one notes a simple trick that 
yields an optimal parametrization of the final state: for any hermitian operator A, 
its action on a quantum state can be written as 

Aty) = (A)\il>) + AA\if> ± ) (6.73) 

where (A) is the standard expectation value (ip\A\il>), the vector is a certain 
state orthogonal to and AA is the standard uncertainty 



AA 



^1 (6-74) 
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This allows us then to select a frame of mutually orthogonal vectors comprised of \ip), 
\ip±) an d some other number of vectors N — 2 of them if N is the dimensionality 
of the Hilbert space. One may then expand \ipn) in that frame as 



N 



= wi\ip) + w 2 \ip±) + ^2wi\i) 

i=3 

where the complex coefficients {wi} are bound by the constraint 



W; 



(6.75) 



(6.76) 



i=l 



With this parametrization, the integral ()6.71|> becomes the complex integral 



d 2 P(z\ 



d 2 z 



J Uli d 2 w t 5(l-E 



N \Un-W 2 

8=1 \\ W l\\ 



>8 2 (z - {A) - AA^ 



I nf=i s (i - Ef =1 



\w 
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We now note a useful property for the 2-d complex delta function: 



S 2 (wz — wz ) 
in terms of which one obtains: 



i 



\w\ 



T 5 2 (z - Zo) 



(6.77) 
(6.78) 



d z P(z\ip) = d z z- 



l S 2 (t 



/ nf=i ^ (i - Eili 



(6.79) 



The rightmost delta function fixes the value of wi as a function of it?i, and hence 
integrating over W2 we have for the constraint delta-function: 



iV 



W) 



i=l 



1-1 + 



z-(A) 



AA 



N 



HH 2 + E 



i=3 



(6.80) 



Performing the change of variables 



1 + 



A.4 



in the upper integral, one obtains: 



d 2 P(z\ip) = K 



d 2 z 

Al2 



1 + 



AA 



(6.81) 



(6.82) 
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where the normalization constant is 

ml-, 1 ^(i-e^ikii 2 ) 



K 



\W\ | 
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(6.83) 



juli <Pwi8(i-EiLi\\wi\r) 

Finally, computing this constant instead by imposing the normalization condition 

d 2 r 



K- 1 



7T 



[1 + r 2 ] 3 2 
we obtain for the 2-dimensional distribution 



d 2 P{a(3\ip) = - 



2 dad/3 



1 



7T AA 2 



1 + 



+ 



(6.84) 



(6.85) 



The distribution then shows that the complex weak value of A is symmetrically 
distributed about z = (a), with a width of order AA. 

Concentrating finally on the real part, we find after integrating over (3 the 
marginal distribution 

3 da 1 



dP(a\ 



AAA 



1 + 



-(A)Y 



AA J 



5/2 ' 



(6.86) 



which is shown in Fig. ()6.11|) . The distribution admits two non-trivial central 
moments, the mean and variance, which are easily computed: 



a 



Aa 



(A) 
AA 

7T 



(6.87) 



Again we note that while for any observable A the overall distribution of weak 
values extends all the way to infinity (unless, of course, is an eigenstate of A), 
the concentration of weak values is nevertheless tighter about the mean than the 
concentration of eigenvalues given the spectral distribution (il)\II a \ip) . 

To answer then the question posed at the beginning of the section as to how 
unlikely are eccentric weak values, let us consider as a representative example an 
operator A, the spectrum of which is bounded by ±a max , and a state \tp) yielding 
a uniform distribution of eigenvalues within this interval. In such case (^4) = and 
AA — dmax/V3; the probability of a weak value outside the spectrum is therefore 

P(\a\ > a max \ip) = I - - I ^- -—- 0.026. (6.88) 



V3[l + x 
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It is interesting to note therefore that when all possible final states are taken into 
account, the relative proportion of eccentric weak values is of the order of one in a 
hundred, clearly not an extraordinarily small number. 
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Chapter 7 

Conclusion and Open Questions 



The model presented in this dissertation may be regarded as a modest step in a 
more ambitious program suggested by the Two- Vector Formulation, namely, the 
construction of a general theory of measurement in quantum mechanics based en- 
tirely on time-symmetric ensembles and weak values. It may be worthwhile then to 
give a brief account of what has been achieved here as well as to point out several 
questions that remain open for future exploration in this direction. 

As a preliminary motivation for the non-linear model, we have suggested a 
sort of complementarity between two "ideal" measurement situations, the standard 
or strong measurement scheme and the weak measurement scheme, each of which 
corresponds to the initial conditions of the measuring apparatus being controlled 
for either optimal precision or conversely, for minimal disturbance of the measured 
system. A clear distinction between the two extremes becomes evident when the 
statistics are analyzed against fixed initial and final conditions on the system: in one 
extreme, the statistics exhibit a spectral distribution for the measured observable, 
whereas in the other the apparatus appears to show a response to a definite weak 
value. By identifying these two extremes, the intermediate "limbo" region of non- 
ideal measurements becomes of considerable interest as one may expect that the 
transition form one description to the other is accompanied by a qualitative change 
in the physics of the measurement interaction. 

As a way of bridging the two descriptions, we have suggested with the non- 
linear model an alternative picture based on weak values for general non-ideal von 
Neumann-type measurements. In this description, the apparatus is seen as driving 
the system, via-back reaction, into various "configurations" -i.e., pairs of initial and 
final states, parameterized by what we have termed the reaction variable of the 
apparatus. Each configuration determines a local weak value for the measured 
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observable as well as a weight factor, the likelihood factor. The non-linear model 
may thus be viewed as the "quantized" version of a picture which in fact proves 
to have a direct classical correspondence: the possible configurations of the system 
are "sampled" with a probability distribution for the reaction variable determined 
by the likelihood factor, and from each configuration the pointer variable receives a 
corresponding "kick" proportional to the local weak value. While direct quantitative 
agreement with the classical picture of statistical sampling is attained only in the 
expectation value of the pointer variable, the picture of sampling nevertheless proves 
useful in analyzing the response of the apparatus at the level of wave functions, where 
the resulting quantum state of the apparatus can be decomposed as a superposition 
of weak measurements. The non-linear model therefore provides a complement to 
the more standard analysis based on the spectral decomposition of the measured 
observable. 

The underlying motivation for this dual description is, as mentioned in the 
introduction, to gain a further understanding of the physics of the measurement 
interaction. The "phase-transition" at the end of Chapter 6 gives a particularly good 
example of a situation in which one may benefit from this dual description, as it is 
from the point of view of the reaction variable where one sees a qualitative change 
in the physics of the interaction as one crosses from the weak to strong regimes at 
a definite critical measurement strength. Such transitions should in fact be quite 
generic as one only needs to identify situations where the likelihood factor exhibits 
a drastic "dip" such as for instance around regions of anomalous superoscillatory 
behavior. It should be interesting therefore to characterize the degree of universality 
in these transitions. 

It would also be desirable to further explore how the standard ideal measure- 
ment scheme relates to the picture of sampling weak values. In Chapter 3 and the 
"phase-transition" example in Chapter 6 we have already given two examples where 
the emergence of a quantized structure in the resulting distribution of the data is 
viewed, from the sampling picture, as an interference phenomenon in the quantum- 
mechanical response of the apparatus to a non-linear effective action. From the 
point of view of the non-linear model therefore, quantization appears to be more 
of an emergent property of the whole measurement interaction as opposed to an 
intrinsic property of the system in isolation. 

It may then be worthwhile to pursue this idea further in systems, such as a 
spin-1/2, considered to be "intrinsically" quantized. In particular, we recall how in 
the case of orbital angular momentum described in Chapter 3, a local sampling of 
the weak value reveals the classical angular momentum, whereas integer value quan- 
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tization emerges only from a global sampling in a manner akin to the appearance 
of band-structures under periodic potentials. Could it then not be the case that 
in a similar fashion, underlying the two "bands" in a Stern-Gerlach measurement 
of Spin-1/2 is in fact a continuous angular momentum vector, such as for instance 
the one defined by the weak values of the three spin components (Fig. II. The 
non-linear model already suggests how this apparently contradictory picture can be 
reconciled with quantization: the quantized structure of the apparatus wave func- 
tion coming from the periodicity in the sampling in addition to a likelihood factor 
which effectively suppresses unusually high values of angular momentum outside of 
the usual range [—1/2, 1/2]. The idea is certainly interesting and novel enough to 
warrant further investigation. 

In this respect, another aspect worth exploring is the "configuration" space of 
the system that is sampled in the measurement process according to the Two- Vector 
description. In the original formulation (SlIElIZj) both the real and imaginary parts 
of the complex weak value are viewed as being equally fundamental elements of the 
physical property associated with the measured observable. To specify univocally 
the complex weak values for all elements of the observable algebra, one therefore 
needs to assign an ordered pair of state vectors, as the imaginary part of ^^j^ is 
odd under a time reversal of the boundary conditions. In the present dissertation, 
however, we have shown that it is only the real part of the weak value which has 
a straightforward interpretation in terms of mechanical effects as it can be related 
directly to a unitary transformation. Furthermore, we have traded the local de- 
scription provided by the imaginary part for the more natural global description in 
terms of probability re-assessment provided by the likelihood factor. It is therefore 
tempting to consider a point in the "configuration" space as being defined in terms 
of a minimal object from which both the likelihood factor and the real weak values 
can be obtained. A candidate for this object is for instance the hermitian operator 



fl = i 

2 



IV'iXV^I |^2>(V>i 



(7.1) 



in terms of which, the weak value of a given observable A is a = Tr[^4f2] and 
the weight factor KV^I^i)! 2 associated with a given pair of vectors is (2Tr[f2 2 ] — 
1) _1 . Besides the obvious time reversal symmetry \ipi) «-> {ifa), a given Cl defines a 
whole equivalence class of pairs connected by a non-trivial continuous £7(1) x £7(1) 
transformation. It may therefore be worthwhile to investigate the significance of 
this degeneracy as well as the geometry of the configuration space defined by such 
objects. 
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Another related point that needs to be pursued with greater care has to do 
with the single measurement event. So far, we have tried to establish a connection 
between the overall statistical distribution of the pointer variable and an underlying 
distribution of sampled weak values. Suppose however we are dealing with a single 
reading of the pointer variable. What can we then infer about the weak values? 
This seems to be a rather subtle question as the weak value distribution and the 
pointer distribution are ultimately related in the same way that that the probability 
distributions for two canonically conjugate variables are related, that is, at the level 
of wave functions through a Fourier transform. The idea of applying Bayes' theorem 
to obtain a posterior distribution of weak values is therefore hindered to the same 
extent that we cannot obtain a positive-definite joint probability distribution for 
two canonically conjugate variables. 

A way of working around this situation may be to trace the weak value in 
question but now on the system-apparatus composite, as the apparatus reading 
completes the necessary information for a two-vector description of the composite 
system. This however brings additional difficulties. Intuitively, one should expect 
that if the measurement interaction is sufficiently weak, the information provided by 
a single reading should not significantly modify the free history of weak values of the 
system. On the other hand, one need not expect this to be the case when dealing 
with strong measurements as a single reading already entails a re-assessment of the 
two- vector pair of the same extent to which in the standard formulation it entails a 
"collapse" of the wave function. Such problems demand a more careful examination 
and may be indicative of the type of difficulties that lie ahead in attempting a more 
rigorous ontological interpretation of the measuring process in terms of weak values. 
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